This is actually not that much different than the 8x8 multiplier that we created earlier. There are two big differences. First, we are going to need to shift a 16 bit number. This can be done simply enough using the carry flag just as was done with the multi-byte additions. The shift left would then look like:
CLC
ROL LOW
ROL HIGH
This would work just fine, but we can save a byte and a few cycles by getting rid of the CLC instruction and instead using the ASL instruction in the place of the first ROL instruction. This always rotates in a 0 while still putting the shifted out bit into the carry flag.
The second issue is how to deal with a 16-bit result. There are two solutions to this problem. We could store the results in some reserved memory (ideally in page 0) which if we were going to output a 24 bit number would be a requirement. With a little bit of register shifting, we can actually keep the results of the multiplication in the registers winch gives us a bit of a speed boost as well as reducing the bytes needed for the program and memory storage. The new multiplication routine now looks like this:
JMP test
; create a placeholder for the multiplicand
multiplicandLo:
.BYTE 0
multiplicandHi:
.BYTE 0
multiplierLo:
.BYTE 0
;
; Multiply method
; A = multiplicand lo, Y = multiplicand Hi, X = multiplier
; result returned in A (lo) and Y (hi)
multiply16x8:
; init by setting zero page vars
STX multiplierLo
STA multiplicandLo
STY multiplicandHi
; start result at zero
LDA #0
TAY
multiply16x8_loop:
; see if still bits in multiplier
LDX multiplierLo
BEQ multiply16x8_done
; shift multiplier right. C set indicates need to add
LSR multiplierLo
BCC multiply16x8_adjust
; add current power of 2 multiplicand
CLC
ADC multiplicandLo
TAX ; save low result in X register
TYA ; transfer high result to Acc for second half of add
ADC multiplicandHi
TAY ; return high result to Y
TXA ; return low result to Acc
multiply16x8_adjust:
; Set up next power of 2
ASL multiplicandLo
ROL multiplicandHi
JMP multiply16x8_loop
multiply16x8_done:
RTS
test:
; 300 x 5 test
LDA #$2C
LDY #$1
LDX #$5
JSR multiply16x8
; fourty-two test
LDA #7
LDY #0
LDX #6
JSR multiply16x8
; LCG Mulitply step test
LDA #42
LDY #0
LDX #141
JSR multiply16x8
NOP
BRK
Here is a trace of the first test.
(0) 0:JMP $31
(3) 31:LDA #$2C ; A:$0 -> $2c
(5) 33:LDY #$1 ; Y:0 -> $1
(7) 35:LDX #$5 ; X:0 -> $5
(9) 37:JSR $6
(15) 6:STX $5 ; [5]:-> $5
(19) 9:STA $3 ; [3]:-> $2c
(23) c:STY $4 ; [4]:-> 1
(27) f:LDA #$0 ; A:$44 -> $0, Flags:32 -> $22
(29) 11:TAY ; Y:1 -> $0
(31) 12:LDX $5 ; X:5 -> 5
(35) 15:BEQ $30
(37) 17:LSR $5 ; [5]:$5 -> $2, C=1
(43) 1a:BCC $27
(45) 1c:CLC ; C=0
(47) 1d:ADC $3 ; A:$0 -> $2c
(51) 20:TAX ; X:5 -> $2c
(53) 21:TYA ; A:$2C -> $0
(55) 22:ADC $4 ; A:$0 -> $1
(59) 25:TAY ; Y:0 -> $1
(61) 26:TXA ; A:$1 -> $2c
(63) 27:ASL $3 ; [3]:$2C -> $58
(69) 2a:ROL $4 ; [4]:$1 -> $2
(75) 2d:JMP $12
(78) 12:LDX $5 ; X:$2c -> $2
(82) 15:BEQ $30
(84) 17:LSR $5 ; [4]:$2 -> $1
(90) 1a:BCC $27
(93) 27:ASL $3 ; [3]:$58 -> $B0
(99) 2a:ROL $4 ; [4]:$2 -> 4
(105) 2d:JMP $12
(108) 12:LDX $5, X:$2 -> $1
(112) 15:BEQ $30
(114) 17:LSR $5 ; [5]:1 -> 0, C=1
(120) 1a:BCC $27
(122) 1c:CLC ; C=0
(124) 1d:ADC $3 ; A:$2c -> $dc
(128) 20:TAX ; X:1 -> $dc
(130) 21:TYA ; A:$dc -> $1
(132) 22:ADC $4 ; A:$1 -> $5
(136) 25:TAY ; Y:1 -> $5
(138) 26:TXA ; A:$5 -> $dc
(140) 27:ASL $3
(146) 2a:ROL $4
(152) 2d:JMP $12
(155) 12:LDX $5, X:$dc -> $0, Z=1
(159) 15:BEQ $30
(162) 30:RTS
As you can see that this works very similar to the original version with only a few changes. A better 8x8 multiply that returns a 16 bit result would be easy to do by simply using this version but always setting Y to 0 when calling. We now have a multiply routine that will allow us to implement both the 16-bit seed version of the LCG algorithm as well as the PCG algorithm. We will start implementing those next fortnight!
No comments:
Post a Comment