Saturday, April 13, 2019

Multi-byte Multiplication 8x16

When you get right down to it, there is an interesting property about the number of bits involved in a multiplication. The maximum value for a multiplication is the number of bits in the multiplicand plus the number of bits in the multiplier. This means that our 8x8 multiplier should produce a 16 bit number. Likewise a 16x16 bit multiply would properly need 32 bits to store the results. For my needs we don't care about overflow situations as for the random number generator but if you wanted proper accuracy chaining more bytes to have larger results is certainly something that could easily be done. What we need to have a 16 bit seed is to multiply a 16 bit number by an 8 bit number.

This is actually not that much different than the 8x8 multiplier that we created earlier. There are two big differences. First, we are going to need to shift a 16 bit number. This can be done simply enough using the carry flag just as was done with the multi-byte additions. The shift left would then look like:

CLC
ROL LOW
ROL HIGH

This would work just fine, but we can save a byte and a few cycles by getting rid of the CLC instruction and instead using the ASL instruction in the place of the first ROL instruction. This always rotates in a 0 while still putting the shifted out bit into the carry flag.

The second issue is how to deal with a 16-bit result. There are two solutions to this problem. We could store the results in some reserved memory (ideally in page 0) which if we were going to output a 24 bit number would be a requirement. With a little bit of register shifting, we can actually keep the results of the multiplication in the registers winch gives us a bit of a speed boost as well as reducing the bytes needed for the program and memory storage. The new multiplication routine now looks like this:

JMP test
; create a placeholder for the multiplicand
multiplicandLo: 
.BYTE 0
multiplicandHi: 
.BYTE 0
multiplierLo: 
.BYTE 0
;
; Multiply method 
; A = multiplicand lo, Y = multiplicand Hi, X = multiplier
; result returned in A (lo) and Y (hi)
multiply16x8:
; init by setting zero page vars
STX multiplierLo
STA multiplicandLo
STY multiplicandHi
; start result at zero
LDA #0
TAY
multiply16x8_loop:
; see if still bits in multiplier 
LDX multiplierLo
BEQ multiply16x8_done
; shift multiplier right. C set indicates need to add
LSR multiplierLo
BCC multiply16x8_adjust
; add current power of 2 multiplicand
CLC
ADC multiplicandLo
TAX ; save low result in X register 
TYA ; transfer high result to Acc for second half of add
ADC multiplicandHi
TAY ; return high result to Y
TXA ; return low result to Acc
multiply16x8_adjust:
; Set up next power of 2
ASL multiplicandLo
ROL multiplicandHi
JMP multiply16x8_loop
multiply16x8_done:
RTS

test:

; 300 x 5 test
LDA #$2C
LDY #$1
LDX #$5
JSR multiply16x8

; fourty-two test
LDA #7
LDY #0
LDX #6
JSR multiply16x8

; LCG Mulitply step test
LDA #42
LDY #0
LDX #141
JSR multiply16x8

NOP
BRK

Here is a trace of the first test.

(0) 0:JMP $31
(3) 31:LDA #$2C ; A:$0 -> $2c
(5) 33:LDY #$1 ; Y:0 -> $1
(7) 35:LDX #$5 ; X:0 -> $5
(9) 37:JSR $6
(15) 6:STX $5 ; [5]:-> $5
(19) 9:STA $3 ; [3]:-> $2c
(23) c:STY $4 ; [4]:-> 1
(27) f:LDA #$0  ; A:$44 -> $0, Flags:32 -> $22
(29) 11:TAY ; Y:1 -> $0
(31) 12:LDX $5 ; X:5 -> 5
(35) 15:BEQ $30
(37) 17:LSR $5 ; [5]:$5 -> $2, C=1
(43) 1a:BCC $27
(45) 1c:CLC ; C=0
(47) 1d:ADC $3 ; A:$0 -> $2c
(51) 20:TAX ; X:5 -> $2c
(53) 21:TYA  ; A:$2C -> $0
(55) 22:ADC $4 ; A:$0 -> $1
(59) 25:TAY ; Y:0 -> $1
(61) 26:TXA  ; A:$1 -> $2c
(63) 27:ASL $3 ; [3]:$2C -> $58
(69) 2a:ROL $4 ; [4]:$1 -> $2
(75) 2d:JMP $12
(78) 12:LDX $5 ; X:$2c -> $2
(82) 15:BEQ $30
(84) 17:LSR $5 ; [4]:$2 -> $1
(90) 1a:BCC $27
(93) 27:ASL $3 ; [3]:$58 -> $B0
(99) 2a:ROL $4 ; [4]:$2 -> 4
(105) 2d:JMP $12
(108) 12:LDX $5, X:$2 -> $1
(112) 15:BEQ $30
(114) 17:LSR $5 ; [5]:1 -> 0, C=1
(120) 1a:BCC $27
(122) 1c:CLC ; C=0
(124) 1d:ADC $3 ; A:$2c -> $dc
(128) 20:TAX ; X:1 -> $dc
(130) 21:TYA  ; A:$dc -> $1
(132) 22:ADC $4 ; A:$1 -> $5
(136) 25:TAY ; Y:1 -> $5
(138) 26:TXA  ; A:$5 -> $dc
(140) 27:ASL $3
(146) 2a:ROL $4
(152) 2d:JMP $12
(155) 12:LDX $5, X:$dc -> $0, Z=1
(159) 15:BEQ $30
(162) 30:RTS 

As you can see that this works very similar to the original version with only a few changes. A better 8x8 multiply that returns a 16 bit result would be easy to do by simply using this version but always setting Y to 0 when calling. We now have a multiply routine that will allow us to implement both the 16-bit seed version of the LCG algorithm as well as the PCG algorithm. We will start implementing those next fortnight!

No comments:

Post a Comment