Saturday, March 16, 2019

Shifted Multiply

Adding numbers together in a loop works for small numbers but the execution time quickly increases as the multiplier increases. Another way of multiplying numbers is by shifting them. Each shift to the left is the equivalent of multiplying by 2. This means that if we shift to the lefft 3 times we have effectively multiplied by 8. The problem is that this only works with powers of two. When you think about it, though, 5x3 is the same as 5x2+5x1. Likewise, 7x6 is just 7x4+7x2.

This means that we cam drastically improve multiplication by simply looping through the bits shifting the multiplier to the right and seeing if it is set. If it is we add the current value of the shifted multiplicand to the result. We then shifting the multiplicand to the left to multiply it by the next power. This is repeated until the multiplier becomes 0. The code that does this is as follows:

; brute force 8-bit multiplication
JMP test
; create a placeholder for the multiplicand
multiplicandLo: 
.BYTE 0
multiplierLo: 
.BYTE 0
;
; Multiply method A = multiplicand, X = multiplier result returned in A
multiply8x8:
; init by setting zero page vars
STX multiplierLo
STA multiplicandLo
; start result at zero
LDA #0
multiply8x8_loop:
; see if still bits in multiplier 
LDX multiplierLo
BEQ multiply8x8_done
; shift multiplier right. C set indicates need to add
LSR multiplierLo
BCC multiply8x8_adjust
; add current power of 2 multiplicand
CLC
ADC multiplicandLo
multiply8x8_adjust:
; Set up next power of 2
ASL multiplicandLo
JMP multiply8x8_loop
multiply8x8_done:
RTS

test:
; fourty-two test
LDA #7
LDX #6
JSR multiply8x8
; 10*20 test
LDA #10
LDX #20
JSR multiply8x8
; LCG Mulitply step test
LDA #42
LDX #141
JSR multiply8x8
NOP
BRK

When ran, we get the following trace for the first test. The second and third tests are pretty similar.

(0) 0:JMP $22
(3) 22:LDA #$7 ; A:0 -> 7
(5) 24:LDX #$6 ; X:0 -> 6
(7) 26:JSR $5
(13) 5:STX $4 ; [4]:0 -> 6
(17) 8:STA $3 ; [3]:0 -> 7
(21) b:LDA #$0 ; A:7 -> 0
(23) d:LDX $4 ; X:6 -> 6
(27) 10:BEQ $21
(29) 12:LSR $4 ; [4]:6 -> 3
(35) 15:BCC $1B
(38) 1b:ASL $3 ; [3]:7 -> $E
(44) 1e:JMP $D
(47) d:LDX $4 ; X:6 -> 3
(51) 10:BEQ $21
(53) 12:LSR $4 ; [4]:3 -> 1, C=1
(59) 15:BCC $1B
(61) 17:CLC ; C=0
(63) 18:ADC $3 ; A:0 -> $E
(67) 1b:ASL $3 ; [3]:$E -> 1C
(73) 1e:JMP $D
(76) d:LDX $4 ; X:3 -> 1
(80) 10:BEQ $21
(82) 12:LSR $4 ; [4]:1->0, C=1
(88) 15:BCC $1B
(90) 17:CLC ; C=0
(92) 18:ADC $3 ; A:$E -> 2A
(96) 1b:ASL $3 ; [3]:$1C -> $38
(102) 1e:JMP $D
(105) d:LDX $4 ; X:1 -> 0, Z=1
(109) 10:BEQ $21
(112) 21:RTS 

The call starts at 3 and runs until 118 for a 115 cycle time. This is slightly worse than the brute force way, but if we continued with the other two tests we would find that the 10x20 test takes 163 cycles which is significantly faster than the 272 cycles for the brute force multiply. And the 42x141 test only took 245 cycles compared to 1845 cycles that the brute force method required.

We now have everything necessary to create a LCG random number generator, which we will be doing next fortnight.

No comments:

Post a Comment