Multiplication

pjw · Post by **pjw** » Sun Apr 02, 2023 4:27 pm

NormanDunbar wrote: ↑Sun Apr 02, 2023 3:00 pm Hi Per,

when I said "Spooky!" I was referring to the algorithm used. I need to have a read through it and try to understand it better.

EDIT: I found this: https://en.wikipedia.org/wiki/Multiplic ... iplication on Wikipedia.

Cheers,
Norm.

Nice one! Thanks!

pjw · Post by **pjw** » Sun Apr 02, 2023 4:30 pm

I dont want to hog this thread so Im creating a new one. Dont know if its going to go anywhere.. Doesnt seem to be a lot of interest in this sort of thing here.

I did a comparison with your own routine (ezine vol 11). On QPC2 I couldnt detect any measurable difference in performance. On SMSQmulator (different engine!) yours was slightly faster (like 1-2%) for large numbers. Predictably, the performance of my version depended on the inputs, while yours was consistent whatever the input.

Ive attached the two versions with harnesses below, so interested parties can see for themselves. Obviously, I stripped yours down to the bare essentials for comparison's sake, and I supplied my 64 bit version.

The harnesses just attach the routines to S*BASIC functions for ease of access. They include a quick and dirty routine to convert the outputs to a 16 byte hex string so I could check that the results were correct.

Simply LRESPR the two toolkits and run the commands in a timed loop. PRINT SMULT3264$(x, y) tests your version, PRINT MULU3264$(x, y) for mine. About 10 million iterations are required, on my systems at least, to get a meaningful measurement. This should be reduced by some orders of magnitude on QL-ish hardware.

Note: When I use "yours" and "mine" Im not trying to invoke a pissing contest

Im simply interested in comparing two very different algorithms! Other people enjoy bungee jumping, or collecting old computers!

Post by **NormanDunbar** » Sun Apr 02, 2023 4:39 pm

Thanks Per.

I knew you weren't up for a pissing contest. I like to see different ways of doing the same thing. I appreciate your input -- do you mind if I write it up for the next (exciting) issue of the irregular Assembly Language eMagazine. Full credit will be given of course.

I've studied your code, and come up with a signed version that does 32*32 bits into 64 bits. I have yet to test it though. Here's my first attempt -- I haven't tried assembling yet though! I've adjusted the inputs and outputs to match what I wrote up in the magazine, that way, I remain slightly less confused!

Thanks again for your efforts, if no-one else appreciates it, I do!

Cheers,
Norm. (Code below)

Code: Select all

;--------------------------------------------------------------
;
; Multiply two 32 bit signed integers using the "Peasant 
; Multiplication" method. 
;
; See Wikipedia: <https://en.wikipedia.org/wiki/
; Multiplication_algorithm#Russian_peasant_multiplication>
;
; and PJW's post on QL Forum at <https://qlforum.co.uk/
; viewtopic.php?p=50874#p50874> fo details.
;
; This method of multiplication is (or at least was) used by 
; sheep and camel herders in the desserts of Arabia. The only 
; skill you need to work this is to be able to count!
;
; You dig two parallel columns of small holes in the ground.
; The number of holes required will depend on the size of of
; your calculation. It is not fixed; you can always add more
;  if you need.
;
; In the topmost hole of the left column you put a number of
; beads -- or stones or sheep or camel droppings, depending on 
; what's available -- corresponding to the number of, say, 
; sheep you wish to sell; 
;
; In the topmost right hole the number of beads corresponding
; to the price per head.
;
; Then, operating on the second row of the right column, for
; every two beads in the hole above, you put one bead in the 
; hole, rounded down to the nearest whole number.
;
; Continue until there is only one bead left.
;
; In the left column you double the number of beads for each
; hole down the column. Each row on the left corresponds to 
; the row on the right, so stop when you get to the row where 
; the right column contains only one bead.
;
; Now, starting from the topmost row, take all the the beads
; from the left column, where the corresponding right column
; contains an ODD number of beads, and put them in the 
; accumulator.
;
; The contents of the accumulator now, magically, contains the
; number of beads that correspond to the total amount of 
; rials, shekels or whatever, for the sale; ie multiplication.
;
; The code below emulates that algorithm.
;
;
;       Entry                   Return
; D0.L = First number           D0.L smashed.
; D1.L = Second number          D1.L smashed.
; D2.L                          D2:D3 = result, high word in D3.
; D3.L
; D6.B = Sign indicator         D6.B = preserved.
;--------------------------------------------------------------


        ******                                     *****
        ****** T O   B E   T E S T E D   F U L L Y *****
        ******                                     *****


;--------------------------------------------------------------
; Code for signed multiply of two 32 bit numbers in D0 and D1.
; Does some jiggery pokery (!) before calling unsigned "mult32"
; and on return, adjusts things to undo the jiggery pokery!
;
; Negative * Negative = Positive. )
; Positive * Positive = Positive. ) Same sign = +ve result.
; Negative * Positive = Negative.
;
; D6.B is used to hold a flag to indicate whether the result 
; has to be sign adjusted. If D6.B is positive, then both input
; values had the same sign, so the result will be positive. If
; D6.B is negative, the signs were different and the result 
; will need to be negative.
;--------------------------------------------------------------
smult32
        moveq #0,d2             ; Clear accumulator low word.
        move.l d2,d3            ; Clear accumulator high word.

        move.l d6,-(a7)         ; Save sign flag.
        moveq #1,d6             ; Assume both have same sign

;--------------------------------------------------------------
; Test D0.L if zero, we are done here, the result is already 
; zero.
;--------------------------------------------------------------
testD0Zero
        tst.l d0                ; Anything to do?
        beq.s testSign          ; Nope, result is 0.

;--------------------------------------------------------------
; Test sign of D0. If negative set a flag in D6 and make D0 
; positive. Negate D6.B to indicate one negative value so far.
;--------------------------------------------------------------
testD0Sign
        bpl.s testD1Zero        ; D0 is positive.
        neg.b d6                ; D0 is negative, set flag.
        neg.l d0                ; And make D0 positive.

;--------------------------------------------------------------
; Test D1.L if zero, we are done here, the result is already 
; zero.
;--------------------------------------------------------------
testD1Zero
        tst.l d1                ; Check other number.
        beq.s testSign          ; Nothing to do, result is 0.

;--------------------------------------------------------------
; Test sign of D1. If negative set a flag in D6.B and make D0 
; positive. Negate D6. At  this point D6.B will be +ve if both
; values had the same sign, and -ve if both were different.
;--------------------------------------------------------------
testD1Sign
        bpl.s testSwap          ; D1 is positive.
        neg.b d6                ; D1 is negative, set flag.
        neg.l d1                ; MAke D1 positive.

;--------------------------------------------------------------
; According to Wikipedia:
; <https://en.wikipedia.org/wiki/Multiplication_algorithm#
; Russian_peasant_multiplication>
; not strictly necessary, it works either way around. But PJW
; wroite it this way.
;--------------------------------------------------------------
testSwap
        cmp.l d0,d1             ; Smallest is the multiplier.
        bcs.s doMult32          ; Already in order.
        exg d0,d1               ; Swap numbers over.
        bra.s doMult32          ; Ready to multiply.

;--------------------------------------------------------------
; This is where we do all of the halving and doubling.
;--------------------------------------------------------------
loop
        lsl.l #1,d1             ; Double multiplicand.
        bvs.s testSign          ; Oops. Overflow!

        lsr.l #1,d0             ; Halve multiplier.
        beq.s testSign          ; No more to do.

;--------------------------------------------------------------
; If the halved value in D0.L is odd, we need to add it to the
; result in D3:D2
;--------------------------------------------------------------
doMult32
        btst #0,d0              ; If even..
        beq.s loop              ; ..skip addition, else..
        add.l d1,d2             ; ..add into the low word
        addx.l #0,d3            ; .. and the higm with carry.     
        bvc.s loop              ; Loop unless D3 overflowed.

;--------------------------------------------------------------
; Adjustments to the sign of the result required? If D6.B is
; positive, then result is fine. If negative, we need to negate
; the result.
;--------------------------------------------------------------
testSign
        tst.b d6                ; +ve = same sign, 
                                ; -ve = different.
        bpl.s done              ; Same sign, all done.

;--------------------------------------------------------------
; To negate a 64 bit value, NEG the low word and NEGX the high
; word.
;--------------------------------------------------------------
negateResult
        neg.l d2                ; Negate result, low word.
        negx.l d3               ; And the high word

;--------------------------------------------------------------
; D3:D2 contains the 64 bit result. D3 holds the high 32 bits
; and D2 the low 32 bits.
;--------------------------------------------------------------
done
        move.l (a7)+,d6         ; Restore sign flag register.
        rts

dilwyn · Post by **dilwyn** » Sun Apr 02, 2023 5:44 pm

Thank you Per. Bit out of my league with this, so when I have a bit more spare time I may study it more closely to see if I can understand it.

pjw · Post by **pjw** » Sun Apr 02, 2023 7:02 pm

Norm, I was going to do a signed version next, but that would just have been for duty, not fun, so Im glad you did it

I dont know if this is the right place to discuss the nitty-gritty. We could take it offline if you (or public sentiment would) prefer..

Looks good, but didnt you miss a step?

Code: Select all

loop
        lsl.l #1,d1             ; Double multiplicand.
        bvs.s testSign          ; Oops. Overflow!

My version says:

Code: Select all

loop
        lsl.l #1,d0
        lsl.l #1,d1     double multiplicand
        bcc.s skip

        addq.l #1,d0    carry

skip

Multiplicand could run to a double long word, no?

I dont know if this would work, but I wanted to try this optimisation:

Code: Select all

loop
        lsl.l #1,d0
        addx.l d1,d1     double multiplicand
;        bvs.s done	oops! Overflow (unlikely. Impossible?)

I havent yet studied your algorithm properly (in my case that means heuristically

), but I was wondering how you dealt the sign in the event that the unsigned multiplication caused bit 63 of the accumulator to be set? Shouldnt that case cause an overflow error?

Just a note on

Code: Select all

testSwap
        cmp.l d0,d1             ; Smallest is the multiplier.
        bcs.s doMult32          ; Already in order.
        exg d0,d1               ; Swap numbers over.
        bra.s doMult32          ; Ready to multiply.

Consider the (extreme) case of 1 x 2^30. The code would have to do a whole bunch of unnecessary loops before it figured out there was no there there. Swap the two numbers around and it is solved in one go. I think (completely unscientifically) that that optimisation is probably worth it..

Yes, feel free to use it any way you like. Perhaps you might like to remove my description of the algorithm (it was terribly badly explained!) and use the Wikipedia description instead. Or the one I appended to the end of 64 bit version..

pjw · Post by **pjw** » Sun Apr 02, 2023 7:03 pm

dilwyn wrote: ↑Sun Apr 02, 2023 5:44 pm Thank you Per. Bit out of my league with this, so when I have a bit more spare time I may study it more closely to see if I can understand it.

No worries, Dilwyn. You do more than your fair share of other good stuff

tofro · Post by **tofro** » Sun Apr 02, 2023 7:47 pm

I'm using this for unsigned 32 x 32 -->64 multiplication:

Code: Select all

; multiplies d0:32 with d1:32 into d0:d1:64 (high-order bits in d0)

umult64:
                move.l d2,-(a7)         ; save d2 (scratch reg)
                move.w d0,d2            ; d0.low to d2
                mulu d1,d2              ; d2 = d0.low * d1.low
                move.l d2,-(a7)         ; result to stack
                move.l d1,d2            ; 
                swap d2                 ; d1.hi to d2
                move.w d2,-(a7)         ; save it
                mulu d0,d2              ; d2.l = d0 * d1.hi
                swap d0                 
                mulu d0,d1              
                mulu (a7)+,d0           
                add.l d2,d1             
                moveq #0,d2             
                addx.w d2,d2            
                swap d2                 
                swap d1                 
                move.w d1,d2            
                clr.w d1                
                add.l (a7)+,d1          
                addx.l d2,d0
                move.l (a7)+,d2
                rts

Boringly straightforward

This is written for the Q68 - that has blazingly fast multiplication, so I thought I could just as well use it. Long multiclipation is used in my programs that use 16:16 fixed-point math. I have a similar division routine somewhere.

pjw · Post by **pjw** » Mon Apr 03, 2023 2:57 pm

Hi Tobias,
Thanks for playing. I just think its more fun to discuss the shelves I
built my self rather than ones picked up from IKEA, although both have
their place.

There are differences. The one you present is definitely marginally faster
than the other two (on QPC2, but barely discernibly so on SMSQmulator). But
it doesnt test for zero inputs so N x 0 takes as long to calculate as N x
M. My version is the slowest of the three, except for when at least one
input is a small number (I only tested 1 x N) when it is as fast or faster.

This one by Jonathan Oakley is also interesting. He really does it the hard
way, bit by bit, with shifts and additions. I guess the microcode versions
must be doing something like this:

Code: Select all

*
* 32-bit integer arithmetic             v0.01  © Mar 1988  J.R.Oakley
QJUMP
*
* Compatible with QDOS, Minerva and SMSQ/E
*

        section code

        xdef usmul

*+++
* Multiply two 32-bit unsigned numbers, giving a 64-bit result
*
*       Registers:
*               Entry                           Exit
*       D0      multiplier                      result, LSLW
*       D1      multiplicand                    result, MSLW
*---
usmul
        move.l  d0,d3                   ; copy multiplier
        move.l  d1,d4                   ; and multiplicand
        moveq   #0,d5                   ; upper LW starts as zero
        moveq   #0,d0                   ; result is
        moveq   #0,d1                   ; also zero
mullp
        lsr.l   #1,d3                   ; get a bit
        bcc.s   noadd                   ; there wasn't one
        add.l   d4,d0                   ; add to LSLW
        addx.l  d5,d1                   ; and with carry to MSLW
noadd
        tst.l   d3                      ; end of multiplier?
        beq.s   mulexit                 ; yes
        add.l   d4,d4                   ; no, double...
        addx.l  d5,d5                   ; ...the multiplicand
        bra.s   mullp                   ; next bit
mulexit
        rts
*
        end

Unfortunately, although the most compact it is also the slowest of the
lot, except where the multiplier is very small. It too is dependents on its
inputs. You can find the full, signed version in dev8_util_cv_muldiv_asm in
the SMSQ/E sources.

I guess theres a little fat in all these routines that could be trimmed if
desired or special circumstances permitted. Different platforms also give
different results, and the same algorithms on different CPUs could also
make a huge difference.

Reading that Multiplication article on Wikipedia, I see further
efficiencies could be achieved, but the added complexity doesnt make much
sense for the relatively small numbers we're operating with here and may
not even kick in for numbers with less than a few hundred decimal digits.

Is any of this useful? Possibly, but thats not necessarily the point. One
thing I learnt from our FUZZBUZZ excercise way back when, was that when you
think youve reached the limit of compactness or efficiency, someone always
comes up with something to blow your socks off!

Post by **NormanDunbar** » Mon Apr 03, 2023 3:03 pm

Hi Per,

pjw wrote:Looks good, but didn't you miss a step?

Not as far as I'm aware. I'm multiplying D0 by D1, or vice versa, D0 gets halved and D1 gets doubled. The result is in D3:D2 with the high 32 bits in D3. Time, and testing, will tell!

pjw wrote:I was wondering how you dealt the sign in the event that the unsigned multiplication caused bit 63 of the accumulator to be set? Shouldn't that case cause an overflow error?

Code: Select all

doMult32
        btst #0,d0              ; If even..
        beq.s loop              ; ..skip addition, else..
        add.l d1,d2             ; ..add into the low word
        addx.l #0,d3            ; .. and the high with carry.     
        bvc.s loop              ; Loop unless D3 overflowed.

Basically, if the result overflows, I'm sitting here looking surprised! The biggest signed positive number is $7FFFFFFF and multiplying that by itself gives $3FFFFFFF:00000001 which has two zero bits at the high end, so in theory, it should never overflow. For unsigned it's $FFFFFFFF times itself, resulting in $FFFFFFFE:00000001 I think! There's no need to worry about overflow.

If one of the numbers is -1 or $FFFFFFFF then the (positive) result will be $7FFFFFFE:80000001 which is then sign adjusted to $80000001:7FFFFFFF so should still be ok.

Disclaimer, I'm no mathematician so the above might require a lot of correcting!

pjw wrote:Yes, feel free to use it any way you like. Perhaps you might like to remove my description of the algorithm (it was terribly badly explained!) and use the Wikipedia description instead. Or the one I appended to the end of 64 bit version.

Thank you very much, and yes, I'll rephrase the description.

Cheers,
Norm.

Pr0f · Post by **Pr0f** » Mon Apr 03, 2023 6:50 pm

was just wondering how these routines compare against the maths co processors (I know this is/was not offered directly on the old QL ) in terms of speed and code efficiency (space used)

The Sinclair QL Forum

Multiplication

Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication

Re: Multiplication