Saturday, December 22, 2018

This is the first of a 3 part review of the "PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation" paper by Melissa O'Neill. I go over the major parts of the paper and try and recreate the core results.
 
Random numbers are used throughout computer science but what is a random number? This is hard to define but we know about randomness in the real world so we desire a system that produces results that match our expectations of those systems. The problem is that computers are deterministic which means they are very poor at doing random things. There are two general ways of generating random numbers on computers: use external "true" random sources or use pseudo-random numbers.

An external source of entropy allows you to take readings from a random system that produces results in an unpredictable manner but does so in a way that when measured you are equally likely to get one of a range (two for binary) of results. There have been a number of systems used for such devices with a lava lamp being the most interesting. Random.org uses weather as their source of random numbers. The downside to this is that readings from natural sources of randomness can only be taken at a slow rate which can be problematic when you need a lot of random numbers and need them quickly as is the case with games. Another issue is that in order to repeat a sequence of random numbers, as you would want to do for testing purposes, you need to save all the bits that you generate.

A pseudo-random system is a deterministic system that produces results that resemble true randomness but can be recreated if the state of the system is determined. This also means that the same starting value will produce same sequence of numbers so testing can be done by using the same seed value. There are a number of different algorithms that can be used to generate random numbers with the methods available covering a variety of needs.  Not all generators are good! So how can we tell what generators are good?

Randograms are an informal way of looking at random number generators. They are generated by creating a 256x256 grid and then taking 32768 pairs of numbers,  plotting them on the grid. The more times a grid is hit with a number, the darker that representation becomes. Here is some Kotlin code that demonstrates how to generate a randogram.

fun generate() {
for (cntr in 0..32767) {
val x = rng.nextUByte()
val y = rng.nextUByte()
++grid[x][y]
}
}

fun drawToCanvas(canvas:Canvas, scale:Double) {
val gc = canvas.graphicsContext2D
for (cntrX in 0..255)
for (cntrY in 0..255) {
gc.fill = Color.gray(1.0 / (grid[cntrX][cntrY].toDouble() + 1.0))
gc.fillRect(cntrX.toDouble()*scale, cntrY.toDouble()*scale, scale, scale)
}
}

The image below are some images that I generated.



With simple random number generators, it is easy to see that they are not random. Better generators, such as Java's random number generator, are not a clear. Java's random number generator is actually not that great but simply looking at the randograms would not reaveal this so better ways of testing are required.

Saturday, December 8, 2018

No post this fortnight

As I expected, between exams, marking/invigilating, and writing research papers I simply don't have the time to create a new post, which probably have been an article on 8-bit multiplying. After I finish my detailed overview of the PCG paper I plan on going over 8 and 16 bit math on the 6502 before creating a LCG, XORShift, XORShift*, and a simple PCG random number generators which will be more useful for NES game development but will still work with the 2600.

Sorry for the lack of content and see you next fortnight.

Saturday, November 24, 2018

Sef-plagiarism?


One of the courses that we are required to take for the Masters of Science in Computer Science program that I am undertaking is called Seminar. It is a single credit course that needs to be taken three times. Each time the student, with the supervision of his instructor, picks a research paper which the student then researches, attempting to confirm or explore the results of the paper, and then presents the paper to the class. In addition to the presentation is a written report on the subject outlining what they have learned from the paper. As anybody reading this blog already knows, I am developing an Atari 2600 emulator. The sound emulation, to do properly, requires something called a Looped-Feedback Shift Register (LFSR) which is similar to a poly-counter which is one of the earliest ways of generating pseudo-random numbers. This had me looking into random number generation in general. One of the papers I was very impressed with is “PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation.” The link to this paper is http://www.pcg-random.org/paper.html and while it is a very lengthy paper, it is not too technical and goes over a lot of details about random number generation in general in addition to presenting a new family of random number generators.

I am planning on going over this paper in three separate articles. First, a look at what a random number is and what makes a good random number generator. Next, how the generators are tested as well as a less-robust testing method that would work for the type of things I need to do. Finally, we will look at the PCG algorithm.

I am going to start this series of articles in four weeks with next fortnight being a week “off” though I may post something if I have the time. The reason for the delay is something known as self-plagiarism. To me this is a really strange concept to me, but some Universities will actually consider using your own material in a paper to be a form of plagiarism. To get around this possibility I am going to wait until my report has been submitted and  use that report as the reference to the blog so it is important that the report be submitted before I start posting the blog. This is probably me being way over-paranoid but it is better to be safe than sorry.

In the meantime, I urge you to get a copy of the paper and read it. See you in a fortnight or two.

Saturday, November 10, 2018

State of the Emulator November 2018


Now that I have finished my 6502 emulation section, it is time to decide what to do next. As I have mentioned a few times, my blog is a bit behind where I am with my work on the emulator. What I tend to do is work on the emulator writing articles as I go and then only post an article every fortnight. The reason for doing this is that I knew that my University work (I am currently enrolled in a Master of Science in Computer Science program while working part time) would be taking a lot of my time so I wanted to make sure that I would have material to post. My plan was to also have the blog follow my development down all the ratholes that it leads down. I am rethinking that now as the sound emulation took me down a really windy rathole. To make my blog a bit more coherent, I will be editing my rathole and presenting the information in a bit more structured way.

For those of you who are wondering about the state of my emulator, the graphical portion of the TIA chip has been completed. I still need to implement the sound, which has led me down the random number generation path as the TIA sound generator is based on a LFSR (Looped Feedback Shift Register). After the sound is either implemented or muted, I will be synching the TIA and 6502 chips then implementing the PIA at which point the emulator will be finished.

Before I get to the TIA, I am going to take a bit of a dive into 6502 programming and random number generation by creating several different random number generators for the 6502 and some test games for the 2600. These test games will be used to test our emulator but will likely also result in changes to the assembler I wrote as I discover issues while creating these games. For the people who are interested in creating games for the 2600, this will be a good break. For those of you who are more interested in the emulator, remember that Test Driven Development dictates that you write your tests first so I am simply following TDD principles here.

My delve into random number generation will start first with a rather in-depth look at a particular paper on the subject. This is partially because the paper does a really good job of explaining all the key concepts that are necessary for the creation of a random number generator but mostly because I am presenting the paper as part of my one of my courses so can effectively kill two birds with one stone. This will give a basis for random number generation but then we look at what can actually be done with the rather limited capabilities of the 2600. We will then look at a few techniques that have been used in games while creating one or more games that use those techniques. This will give us a test library for the emulator while also demonstrating what is necessary to create a 2600 game.


Saturday, October 27, 2018

Debugging commands


This leaves us with only two commands remaining. The BRK is a software interrupt, while the NOP command does nothing. Doing nothing is a handy thing as we will discuss shortly.
As mentioned in the previous article, when an IRQ happens, the processor stops what it is doing and executes the interrupt handler. BRK issues an IRQ but unlike a normal IRQ this one is not maskable so will always occur. This results in the code continuing execution at the address that is stored in $FFFE-$FFFF. On early computer systems this was set up to be a special machine language debugging program called a monitor. This allowed the programmer to scatter BRK statements through their code which would trigger the monitor when they were reached. The monitor would display status information about the program and let the programmer issue debugging commands for doing things such as changing the values in memory, changing the execution address, or continuing the program right after the break point.

When a programmer was finished with a specific BRK statement, they would simply replace the value at that location with $EA which is the OPCode for NOP. In other words, in the early days break points were coded into a program!

As if removing break points from a program wasn’t a good enough reason to have the NOP command, it also takes two cycles to execute. This means that if you have timing critical parts of your program you can use NOP instructions as padding to get the right timing. With the 2600, timing is very important so this is not something to be taken for granted.

Testing was a bit tricky for implementing these instructions. Making sure that the NOP was timing correctly probably didn’t need testing as that is handled by the emulator but as the operation of the TIA chip requires precise timing making sure cycles are working important so for these tests run until a cycle is reached instead of until a break is reached. This is kind of required as testing the break does require that the break command actually get executed!


BRKBReaK
Address Mode
Decimal OPCode
Hexadecimal OpCode
Size
Cycles
Implied
0
$00
1
7
Flags affected: I
Usage: Triggers the Break interrupt causing the program to start executing code from the address stored at $FFFE-$FFFF. Used for debugging with the IRQ address set to a monitor program.
Test Code:
; BRK  -> requires IRQ set to 512, cycles = 71
            LDX #255 ; 2
            TXS     ; 4
loop: BRK        ; 11, 33, 55, END
            ; start here with 22, 44, 66
            INX ; 24, 46, 68
            JMP loop   ;27, 49, 71
.ORG 512
            ; reach here at 11, 33, 55
            TAY  ; 13, 35, 57
            TXA  ; 15, 37, 59
            RTI ; 22, 44, 66
            ;x=2, y = 0, A = 1

Implementation:
pushByteOnStack((state.ipNext / 256) and 255)
pushByteOnStack(state.ipNext and 255)
pushByteOnStack(state.flags or BREAK_FLAG or INTERRUPT_FLAG)
state.ipNext = findAbsoluteAddress(0xfffd)


NOPNo OPeration
Address Mode
Decimal OPCode
Hexadecimal OpCode
Size
Cycles
Implied
234
$EA
1
2
Flags affected: None
Usage: This command tends to be used for debugging to replace BRK instructions. It can be used in timing sensitive programs as a 2 cycle delay.
Test Code:
; NOP -> run for 32 cycles
            LDX #0            ;2
            loop: NOP  ; 4, 15, 26
            NOP                ; 6, 17, 28
            NOP                ; 8, 19, 30
            INX                  ; 10, 21, 32
            BNE loop   ;13, 24
            ; expect at 32 cycles for x=3
                       
Implementation:
_->

Saturday, October 13, 2018

Interrupts


Interrupts can be confusing though shouldn’t be. Part of the problem is that they happen outside of the program and are designed to be silent in their nature. The reason interrupts exist is to allow for outside devices to let the processor know that something has happened. When it happens, the 6502 stops executing the current program, runs an interrupt handler, then returns to the program that was running without the executing program being aware that it was interrupted.


The 6502 uses a vector table located at the end of address space that holds addresses for handlers for the 3 types of interrupts that it supports. The Atari 2600 only needs to worry about resets and IRQ caused by issuing a BRK statement but other systems (NES) do use NMIs.

Low byte address
High byte address
Interrupt
Explanation
$FFFA
$FFFB
NMI
Non-maskable interrupt (SEI won’t stop it)
$FFFC
$FFFD
Reset
The processor has been reset
$FFFE
$FFFF
IRQ
Maskable interrupt or BRK has occurred

When an interrupt request (IRQ) happens the processor does the following:
1.    Push the next IP address (high byte then low byte) onto the stack to save current executing position with the program.
2.    Pushes the processor status register onto the stack (essentially performs a PHP command)
3.    Sets the IP address to the address stored in the vector table as outlined in the table above.


This takes a total of seven cycles plus the time of the interrupt code plus the six cycles for the RTI instruction which means that an interrupt that happens during a timing loop will potentially throw off the program’s timing by quite a bit. It is the responsibility of the interrupt handler to make sure that any registers it uses are restored to the values that they originally contained before issuing the RTI.


The RTI instruction, in case it is not obvious already, returns execution back to the program by doing the following:
1.    Pulls the processor status from the stack (essentially a PLP instruction).
2.    Pulls the IP address of the next instruction that would have occurred before the interrupt (low byte then high byte) from the stack and sets the IP address to that value


Writing an interrupt handler is simply a matter of saving off any registers you use before changing the registers and restoring the registers before calling RTI. It is my understanding that the CLI command doesn’t need to be issued unless you want the interrupt handler to be interruptible, which may be the case if you are performing a complex task.

RTIReTurn from Interrupt
Address Mode
Decimal OPCode
Hexadecimal OpCode
Size
Cycles
Implied
77
$4D
1
6
Flags affected: CDINVZ (from stack)
Usage: Used by interrupt handlers to return execution back to the program once the interrupt has been handled.
Test Code:
LDX $FC ; for test, set up stack
TXS     ; the interrupt data alreay in there (see org)
CLC     ; the carry flag is set in the stack
RTI     ; return using stack data
BRK
.ORG $1FD
.BYTE  33 8 2
.ORG 520
BCC done   ; the carry flag is set in the stack!
LDX #42
done: BRK
; expect X=42
Implementation:
state.flags = pullByteFromStack()
val low = pullByteFromStack()
val high = pullByteFromStack()
state.ipNext = (high * 256 + low)

Saturday, September 29, 2018

Rotating and Shifting Bits right



Shifting and rotating right works same as rotating left except in opposite direction so the bits move from high to low. The right shift command is LSR while the rotation command is ROR.
In the previous article we packed a playing card into a byte. We had a bit to representing if the card was revealed, 4 bits to hold the face value, and 2 bits to hold the suit. This would give us a packed byte in the format 0RFFFFSS. To unpack the card byte we can do the following



LDA cardToUnpack
TAX
AND #3
STA cardSuit
TXA
LSR
LSR
TAX
AND #15
STA cardFace
TXA
LSR
LSR
LSR
LSR
STA cardShowing
As with multiplication, shifting right is the equivalent of dividing by powers of 2. Unfortunately non-powers of two are a bit more complicated to perform than non-power of two multiplications.



Working with multi-byte shifting is a bit different with multiple bytes as you start with the highest byte and work your way towards the lowest byte using LSR on the high byte (ROR if doing a multi-byte rotation) and using ROR for all the remaining bytes working towards the lowest byte. Here is a two-byte example:



LDA highByte
LSR
STA highByte
LDA lowByte
ROR
STA lowByte
RORROtate Right by one bit
Address Mode
Decimal OPCode
Hexadecimal OpCode
Size
Cycles
Accumulator
106
$6A
1
2
Zero Page
102
$66
2
5
Zero Page,X
118
$76
2
6
Absolute
110
$6E
3
6
Absolute,X
126
$7E
3
7
Flags affected: CNZ
Usage: Rotates bits right using the carry bit to fill in the missing bit and putting the low order bit into the carry flag. Used primarily for division by powers of 2 and reading serial data though can be used to reposition bits.
Test Code:
; ROR accumulator
CLC
LDA #1
ROR A
BRK
; expect A=0, Z=1, C=1, N=0
; ROR with memory
LDX #1
ROR 254
ROR 254,X
ROR 256
ROR 256,X
BRK
.ORG 254
.BYTE 1 254 $EF 254
;expect MFE=0 MFF=255 M100=$77 M101=$FF N=1, Z=0, C=0
Implementation:
// Accumulator
state.acc = performRightBitShift(state.acc, true)
// Zero Page
val address = mem.read(state.ip+1)
mem.write(address, performRightBitShift(mem.read(address), true))
// Zero Page, X
val address = mem.read(state.ip+1)+state.x
mem.write(address, performRightBitShift(mem.read(address), true))
// Absolute
val address = findAbsoluteAddress(state.ip)
mem.write(address, performRightBitShift(mem.read(address), true))
// Absolute, X
val address = findAbsoluteAddress(state.ip)+state.x
mem.write(address, performRightBitShift(mem.read(address), true))
LSRLogical Shift Right by one bit
Address Mode
Decimal OPCode
Hexadecimal OpCode
Size
Cycles
Accumulator
74
$4A
1
2
Zero Page
70
$46
2
5
Zero Page,X
86
$56
2
6
Absolute
78
$4E
3
6
Absolute,X
94
$5E
3
7
Flags affected: CNZ
Usage: Shifts bits right using  0 to fill in the missing bit and putting the low order bit into the carry flag. Used primarily for single byte division by powers of 2 and reading serial data though can be used to reposition bits.
Test Code:
; LSR accumulator
LDA #1
SEC
LSR A
BRK
; expect A=0, Z=1, C=1, N=0
; LSR with memory
LDX #1
LSR 254
LSR 254,X
LSR 256
LSR 256,X
BRK
.ORG 254
.BYTE 128 127 $EE $76
;expect MFE=64 MFF=63 M100=$77 M101=$3B N=1, Z=0, C=0
Implementation:
// Accumulator
state.acc = performRightBitShift(state.acc, false)
// Zero Page
val address = mem.read(state.ip+1)
mem.write(address, performRightBitShift(mem.read(address), false))
// Zero Page, X
val address = mem.read(state.ip+1)+state.x
mem.write(address, performRightBitShift(mem.read(address), false))
// Absolute
val address = findAbsoluteAddress(state.ip)
mem.write(address, performRightBitShift(mem.read(address), false))
// Absolute, X
val address = findAbsoluteAddress(state.ip)+state.x
mem.write(address, performRightBitShift(mem.read(address), false))