Saturday, February 16, 2019

Building a Simple trace utility

One of the early development tools for assembly language programmers was a trace tool. This was a simple tool that you ran and it would print out what the program was running at every step of the process. This type of tool could be considered the predecessor to the debugger. My eventual plans are to turn this tool into a debugger eventually. For developing the random number generator libraries I don't need that much of a utility. All that is really needed is the ability to load an assembly language file, assemble it, then run the file printing the command being executed and the state of the registers.
Trace utilities can be thought of as an automated single-step through the program that runs until a break statement is encountered. It is handy for seeing what is happening in non-linear programs such as the following demo:

    LDA #0
LDX #3
ADC #3
BNE loop

While the above example may be fairly simply to manually figure out, seeing what is happening line-by-line can still be useful for debugging as it is very easy to overlook something obvious, especially if it is a mistake that you have made yourself. It is also very handy for demonstrating what a piece of code is doing, which is also very important for explaining the mathematical functions that I will be writing in the remainder of this chapter.

People who have been following this series from the beginning will remember that I said that an emulator was essentially a disassembler that also had an interpreter attached to it. As we already have a dissasembler, and we have an emulator that can run just an individual instruction while storing the state of the registers in an accessible data class, the actual trace part of the trace utility is simply calling the dissasembler to get the assembly instructions, then running that step, then grabbing the register state and printing the results.

// run until break
m6502.state.ip = 0
var traceString = ""
var ipAddress = m6502.state.ip
while ( ipAddress ) != 0) {
traceString = "(" + m6502.state.tick +") " +
ipAddress.toString(16) + ":" +
ipAddress = m6502.state.ip
traceString = traceString + "# A:" + m6502.state.acc.toString(16) +
", X:" + m6502.state.x.toString(16) +
", Y:" + m6502.state.y.toString(16) +
", Flags:" + m6502.state.flags.toString(16)

When this is run on our sample program, we get the following results.

(0) 0:LDA #$0# A:0, X:0, Y:0, Flags:22
(2) 2:LDX #$3# A:0, X:3, Y:0, Flags:20
(4) 4:CLC # A:0, X:3, Y:0, Flags:20
(6) 5:ADC #$3# A:3, X:3, Y:0, Flags:20
(8) 7:DEX # A:3, X:2, Y:0, Flags:20
(10) 8:BNE $4# A:3, X:2, Y:0, Flags:20
(13) 4:CLC # A:3, X:2, Y:0, Flags:20
(15) 5:ADC #$3# A:6, X:2, Y:0, Flags:20
(17) 7:DEX # A:6, X:1, Y:0, Flags:20
(19) 8:BNE $4# A:6, X:1, Y:0, Flags:20
(22) 4:CLC # A:6, X:1, Y:0, Flags:20
(24) 5:ADC #$3# A:9, X:1, Y:0, Flags:20
(26) 7:DEX # A:9, X:0, Y:0, Flags:22
(28) 8:BNE $4# A:9, X:0, Y:0, Flags:22

Not the prettiest of output, but that can be easily tweaked. So with a very tiny amount of work, we now have a useful tool for writing and testing chunks of assembly language. We are now ready to start building the math libraries that we will be needing for writing our random number generators. We will start covering multiplication next fortnight.

Saturday, February 2, 2019

Preparing for Random Assembly

In the last post we covered the basics of a linear congruential generator and the simplest form of the permuted congruential generator. Now I am going write some generic 6502 implementation of these generators. This will require covering a few topics that not been covered yet as well such as writing a simple trace utility so the results can be seen. This fortnight I will break down the topics that will be covered over the next few months.

The first step is to create a simple trace utility that will let you run some assembly language code showing the effected registers and memory as the program runs. This is the equivalent of using a debugger to single-step through the code. One alternative to this would be to use an existing emulator that contains a debugger, but there are three reasons why I am not going this route. First, a trace utility would be fun to write. Second, I ultimately want my emulator to also be an IDE so development tools like the trace would be great to have. Finally, going with the existing emulator would require writing for a specific platform which at a minimum means a bunch of boilerplate code obfuscating the code. Writing a trace utility should not take too long, probability only one or two posts.

For the pure 8 bit version of LCG we need 8-bit multiply which is something  that the 6502 does not do.  Software multiply is possible so we will look at a couple of ways of doing this before implementing our 8-bit LCG.

Before getting to the 16-bit version of LCG we will have to take a look at how to perform multibyte math and then expand my multiply routine to 16-bits which is a bit of work.

Finally we will implement the 16 bit version of LCG and the most basic version of PCG.

Saturday, January 19, 2019

Building PCG

A LCG basically uses the following calculation: X[n] = (A * X[n-1] + C) % M. X[n] is the next number in the sequence to generate with A being a specially selected multiplier value, C is a value to add to the generator, and M is the modulus or range of the generator. We will use Rotenberg's values of 141 for A, 3 for C and just use a byte so 256 for the modulus. Lets generate a randogram and see what we get.

While this is not as good as we hoped, it is not as bad as it seems as the order the dots appear is random. We are using an 8 bit generator so what if we went with 16? Obviously the lower order byte would remain the same but what if we took higher bits for the output.

Interesting enough, each bit you shift becomes more random with different shifts having different coverage of the randogram. What if we added a permutation to select from some of the better shifts? If only we had a random way of selecting which shift to use. What about the top 2 bits (more for larger generators)? While they have the most randomness, we may gain overall.

Melissa added additional permutations for rotation, XOR, and XORShift. More permutations are possible if you follow the rules:

  • Bits can be thrown away but you can't make new ones (precious).
  • Target and control groups of bits must be separate.
  • Control bits must not be modified.
  • Can do many rounds and different rounds can have different control bits.

By combining different permutations, it was possible to complete BigCrush using only a 36 bit PCG (Permuted Congruential Generators). So, in conclusion PCG is a family of Pseudo-Random Number Generation algorithms. They are relatively simple to implement. They run Fast as the code is simple boolean operations, multiplications and additions. They are Space-Efficient requiring less than double the modulus size. And passing the TestU01 BigCrush tests shows they are Statistically Good.

Saturday, January 5, 2019

Testing Random Number Generators

There are a number of factors that people look for in a random number generator.

  • Period - How long a generator can run before it repeats the entire sequence.
  • Uniformity - All the numbers within the range will appear.
  • Predictability - how predictable is the next number in the sequence.
  • Repeatability - can the same number repeat itself?
  • Security - is the sequence invertable?
  • Seekability - can we quickly skip over numbers?
  • Streams - How usable is the generator when threading?
  • k-dimensional equadistribution - How well distributed numbers are.

This diverse range of usage inevitably meant that there were a lot of different tests. L'Ecuyer and Simard made a huge contribution to testing in 2007 with the release of TestU01. They took a look at the myriad of tests that were available and created a test suite that includes a large number of previously independently published tests. They combined a number of different tests into batteries of tests (SmallCrush, BigCrush). Lots of the random number generators that were in common use at the time failed the test suite. The PCG paper covers testing of a large number of random number generators for those of you who are curious.

TestU01 is a C library with tests and generators for the most common PRNGs. This means that you need to write a simple C program to run the tests that you desire. Here is the version of Java's random number generator that I wrote to test it. Note that the values used are based on the API documentation and are the minimal random number generator that is required.

// Adapted from TestU01 manual, Figure 2.2, Figure 2.4

#include "TestU01.h"

// Example PRNG: Xorshift 32

static unsigned long x = 2463534242U;
static unsigned long a = 0x5DEECE66DL;
static unsigned long c = 11U;

unsigned int jvmlcg (void)
x = x * a + c;
return (int)(x>>16);

int main()
// Create TestU01 PRNG object for our generator
unif01_Gen* gen = unif01_CreateExternGenBits("JVM LCG", jvmlcg);
// Run the tests.

// Clean up.

return 0;

The Java test did not do very well on the big crush test. I ran the tests in an emulator running Linux so that I didn't need to set up MinGW on my system as the libraries are for Unix. If you see a flaw in my program, feel free to email me. You can run the test for yourself but to save you hours of waiting, here are the final test results:

========= Summary results of BigCrush =========

Version:          TestU01 1.2.3
Generator:        JVM LCG
Number of statistics:  160
Total CPU time:   08:55:08.77
The following tests gave p-values outside [0.001, 0.9990]:
(eps  means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):

Test                          p-value
2  SerialOver, r = 22               eps  
4  CollisionOver, t = 2           1 - eps1
6  CollisionOver, t = 3           1 - eps1
8  CollisionOver, t = 7             eps  
10  CollisionOver, t = 14            eps  
12  CollisionOver, t = 21            eps  
13  BirthdaySpacings, t = 2          eps  
14  BirthdaySpacings, t = 3          eps  
15  BirthdaySpacings, t = 4          eps  
16  BirthdaySpacings, t = 7          eps  
17  BirthdaySpacings, t = 7          eps  
18  BirthdaySpacings, t = 8          eps  
19  BirthdaySpacings, t = 8          eps  
20  BirthdaySpacings, t = 16         eps  
21  BirthdaySpacings, t = 16         eps  
22  ClosePairs mNP, t = 3          1.6e-26
22  ClosePairs mNP1, t = 3         1.3e-48
22  ClosePairs mNP2S, t = 3          eps  
23  ClosePairs mNP, t = 5           3.4e-4
23  ClosePairs mNP1, t = 5          3.4e-4
23  ClosePairs mNP2S, t = 5          eps  
24  ClosePairs mNP2S, t = 9        5.7e-46
25  ClosePairs mNP, t = 16         8.0e-30
25  ClosePairs mNP1, t = 16        3.7e-29
25  ClosePairs mNP2S, t = 16      1.1e-283
27  SimpPoker, r = 27                eps  
29  SimpPoker, r = 25                eps  
32  CouponCollector, r = 20          eps  
33  CouponCollector, r = 27          eps  
35  Gap, r = 25                      eps  
37  Gap, r = 20                      eps  
43  Permutation, t = 10             0.9997 
58  AppearanceSpacings, r = 27       eps  
60  WeightDistrib, r = 20            eps  
61  WeightDistrib, r = 28            eps  
64  WeightDistrib, r = 26            eps  
67  MatrixRank, L=30, r=26         7.9e-10
75  RandomWalk1 H (L=50, r=25)       eps  
75  RandomWalk1 M (L=50, r=25)       eps  
75  RandomWalk1 J (L=50, r=25)       eps  
75  RandomWalk1 R (L=50, r=25)       eps  
75  RandomWalk1 C (L=50, r=25)       eps  
85  Fourier3, r = 27                 eps  
87  LongestHeadRun, r = 27           eps  
87  LongestHeadRun, r = 27         1 - eps1
89  PeriodsInStrings, r = 20       1 - eps1
91  HammingWeight2, r = 27         5.0e-13
96  HammingIndep, L=30, r=27         eps  
98  HammingIndep, L=300, r=26        eps  
100  HammingIndep, L=1200, r=25       eps  
102  Run of bits, r = 27              eps  
106  AutoCor, d=3, r=27              2.6e-7
All other tests were passed

One interesting observation that Melissa made in her study is that the XORShift did poor in the crush tests but it's companion XORShift* did exceptionally well. This is a surprise when you consider that XORShift and XORSHift* use the same shift-and-apply-xor operations. The only difference between XORShift and XORShift* is XORShift* adds a multiplication at the end. Anybody familiar with random number generators will know that Linear Congruential Generators (LCGs) work by multiplying then adding. This means that XORShift* gains it's powers by essentially running it's output through a LCG.

If an improvement step works for XORShift, would such an improvement step work for a LCG?

Saturday, December 22, 2018

This is the first of a 3 part review of the "PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation" paper by Melissa O'Neill. I go over the major parts of the paper and try and recreate the core results.
Random numbers are used throughout computer science but what is a random number? This is hard to define but we know about randomness in the real world so we desire a system that produces results that match our expectations of those systems. The problem is that computers are deterministic which means they are very poor at doing random things. There are two general ways of generating random numbers on computers: use external "true" random sources or use pseudo-random numbers.

An external source of entropy allows you to take readings from a random system that produces results in an unpredictable manner but does so in a way that when measured you are equally likely to get one of a range (two for binary) of results. There have been a number of systems used for such devices with a lava lamp being the most interesting. uses weather as their source of random numbers. The downside to this is that readings from natural sources of randomness can only be taken at a slow rate which can be problematic when you need a lot of random numbers and need them quickly as is the case with games. Another issue is that in order to repeat a sequence of random numbers, as you would want to do for testing purposes, you need to save all the bits that you generate.

A pseudo-random system is a deterministic system that produces results that resemble true randomness but can be recreated if the state of the system is determined. This also means that the same starting value will produce same sequence of numbers so testing can be done by using the same seed value. There are a number of different algorithms that can be used to generate random numbers with the methods available covering a variety of needs.  Not all generators are good! So how can we tell what generators are good?

Randograms are an informal way of looking at random number generators. They are generated by creating a 256x256 grid and then taking 32768 pairs of numbers,  plotting them on the grid. The more times a grid is hit with a number, the darker that representation becomes. Here is some Kotlin code that demonstrates how to generate a randogram.

fun generate() {
for (cntr in 0..32767) {
val x = rng.nextUByte()
val y = rng.nextUByte()

fun drawToCanvas(canvas:Canvas, scale:Double) {
val gc = canvas.graphicsContext2D
for (cntrX in 0..255)
for (cntrY in 0..255) {
gc.fill = Color.gray(1.0 / (grid[cntrX][cntrY].toDouble() + 1.0))
gc.fillRect(cntrX.toDouble()*scale, cntrY.toDouble()*scale, scale, scale)

The image below are some images that I generated.

With simple random number generators, it is easy to see that they are not random. Better generators, such as Java's random number generator, are not a clear. Java's random number generator is actually not that great but simply looking at the randograms would not reaveal this so better ways of testing are required.

Saturday, December 8, 2018

No post this fortnight

As I expected, between exams, marking/invigilating, and writing research papers I simply don't have the time to create a new post, which probably have been an article on 8-bit multiplying. After I finish my detailed overview of the PCG paper I plan on going over 8 and 16 bit math on the 6502 before creating a LCG, XORShift, XORShift*, and a simple PCG random number generators which will be more useful for NES game development but will still work with the 2600.

Sorry for the lack of content and see you next fortnight.

Saturday, November 24, 2018


One of the courses that we are required to take for the Masters of Science in Computer Science program that I am undertaking is called Seminar. It is a single credit course that needs to be taken three times. Each time the student, with the supervision of his instructor, picks a research paper which the student then researches, attempting to confirm or explore the results of the paper, and then presents the paper to the class. In addition to the presentation is a written report on the subject outlining what they have learned from the paper. As anybody reading this blog already knows, I am developing an Atari 2600 emulator. The sound emulation, to do properly, requires something called a Looped-Feedback Shift Register (LFSR) which is similar to a poly-counter which is one of the earliest ways of generating pseudo-random numbers. This had me looking into random number generation in general. One of the papers I was very impressed with is “PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation.” The link to this paper is and while it is a very lengthy paper, it is not too technical and goes over a lot of details about random number generation in general in addition to presenting a new family of random number generators.

I am planning on going over this paper in three separate articles. First, a look at what a random number is and what makes a good random number generator. Next, how the generators are tested as well as a less-robust testing method that would work for the type of things I need to do. Finally, we will look at the PCG algorithm.

I am going to start this series of articles in four weeks with next fortnight being a week “off” though I may post something if I have the time. The reason for the delay is something known as self-plagiarism. To me this is a really strange concept to me, but some Universities will actually consider using your own material in a paper to be a form of plagiarism. To get around this possibility I am going to wait until my report has been submitted and  use that report as the reference to the blog so it is important that the report be submitted before I start posting the blog. This is probably me being way over-paranoid but it is better to be safe than sorry.

In the meantime, I urge you to get a copy of the paper and read it. See you in a fortnight or two.