Bill's Homebrew and Game Jam Blog: June 2019

Saturday, June 22, 2019

ITS TIA TIME!

We have a processor so now we want a way of displaying information to the user. The Atari 2600 used a custom chip called the Television Interface Adapter or TIA for displaying things. I have programmed graphics for a variety of consoles and computers but the TIA is nothing like those other devices. The NES, GameBoy, and Commodore 64 used a tile set approach to graphics where you had 256 8x8 characters that you would write to a grid that formed the screen display. The Amiga and Macintosh used bitmap graphics where you had a grid of pixels (sometimes broken into play-fields) that you would draw to. The PC had both depending on which graphics mode you had the graphics card in. The Atari 2600 does none of this!

The TIA has no video memory. The closest it comes to video memory is the play-field bits and sprite bits. In fact, the 128 bytes of RAM you have is less than the 160 pixels that can be displayed on a single line of the display. So, you are probably wondering, how do you display a screen on the 2600? You draw it in real time as the display is being generated. You have 76 processor cycles per line to set up the TIA registers to get it to display what you want. Some programmers have described programming the TIA as “racing the beam” which also happens to be the name of the book that got me interested in programming the 2600.

In North America, the television standard back then was NTSC. Other countries used different formats which meant they had different versions of the TIA and different restrictions. The programmer was responsible for the display which started with a 3 scan-line syncing signal followed by 37 vertical blank lines. These lines gave the programmer time to do some non-display work. The 192 lines that followed were the display. An additional 30 over scan lines were then drawn. These lines could appear on some TVs but generally you were to make these lines blank for compatibility. This provided a bit more time for non-display work.

If things sounded tricky, it gets worse. At the time, chips were expensive to manufacture so the designers of the chip were more focused on keeping the size of the chip and number of transistors as small as possible to maximize the yield during manufacturing. This meant that things like keeping the bits in the play-field registers the same orientation as the position on the display were ignored to save transistors. Even with these concessions, the chip had a huge transistor count for the time, but thanks to Moore’s Law the transistor count grew allowing for much nicer to program graphics chips in later consoles and computers.

The TIA had 128 colors that were specified using a 4 bit hue and a 3 bit luminance. This is quite a bit different than the RGB format modern graphics uses. The palette was tied to the televisions of the time so I am talking about NTSC televisions. Oversea versions of the console had different palettes for PAL and for SECAM televisions but for now I am only worrying about the NTSC version of the chip. Each line could have 4 colors which consisted of the background color, the play-field color, the player 0 color, and the player 1 color. These colors could be changed in the middle of drawing a scan-line but with only 76 cycles and a STA taking at least 3 cycles, the number of colors on a line was drastically limited.

The play-field is a 20 bit image that was spread across half of the screen (4 pixels per play-field bit) and could then be repeated or flipped for the other half of the screen. The data making up the 20 bits could be changed allowing for a 40 bit non-symmetric play-field.

In addition to the play-field, the programmer had 5 sprites to work with. These sprites were intended for specific uses so you had two player sprites, two missile sprites, and a ball sprite. The missile and ball sprites were simply lines that could be 1,2,4, or 8 pixels wide. The two player sprites were much more flexible allowing for a byte representation of the image so you had 8 pixels to play with. As you could also scale or repeat the byte it was possible with some trickery to have the sprite repeat itself 3 times with the data for the sprite changed between repeats allowing for 6 player sprites to appear on a scan line.

Because each scan line is separate, the height of the sprites is up to the programmer but is effectively the height of the screen with the programmer only enabling the sprite for the rows of the display he or she wants the sprite to appear on.

My implementation plan (I am writing this before I started working on it but am posting this much later) is to first get the colors working, then get the play-field working, then get the sprites working at which point I start synchronizing the TIA emulator with the 6502 emulator and have a working machine less input and sound. Once I have a working display, the input system will be worked on and finally some sound.

Saturday, June 8, 2019

A Permuted Congruental Generator for the 6502

The Permuted Congruental Generator is a family of random number generators which I covered last year when I reviewed the paper from which they were introduced. A quick summary is that PCG does for Linear Congruential Generators (LCG) what XORShift* does for XORShift. It takes the LCG generated value and applies one or more permuted functions to this value to come up with a more random number based on the hypothesis that the fewer bits that are required to pass the TestU01 suite of tests. There are a variety of permutations that can be combined but for the 6502 I am just going to go with shifting.

The basic idea behind shifting is that we use the upper two bits (the most random bits) to pick which set of the remaining 14 bits will be used for the final 8 bits that are returned to the user. For LCG, we normally just return the highest eight bits. By having a pseudo-randomly selected shift we can get better results.

The code for this is just the lcg code we created in an earlier post with the shifting code added to the end of the routine. There is a need for the 16-bit seed value from the LCG as well as an additional two byte temp value for performing the shifting as it is easier and faster to do this work in memory.

PCG uses the multiply16x8 routine that we developed in an earlier post to do the multiplication work. We multiply the seed by the A variable and then we add the C variable to it. We are using 141 and 3 for these values but other carefully selected values could be used. Now we have effectively performed a LCG function so we then apply the permutation.

We first save off the seed and a copy of it to our temp variable. The upper two bits are needed so we transfer the upper seed result into the accumulator and then take advantage of the ROL instruction to shift the last two bits into the first two bits of the accumulator. This bit of code may seem confusing as 3 ROL instructions are used. This is necessary as the first one shifts the high order byte into the carry flag then the second one rotates this bit into the low order bit and the third one completes the transfer of two bits from the end of the byte to the beginning. We use an AND instruction to clear out the rest of the byte and then add 3 to this result to end up with a value between 3 and 6 which we use to determine how many times to shift the results.

The shifting is done by using the combination of LSR and ROR to perform a 16 bit shift on the temp value. This is done in a loop so it is done 3 to 6 times. The low order temp byte is then returned as the result of this operation.

JMP test

; set up our constants for A and C
.EQU LCG_A 141
.EQU LCG_C 3

; create a placeholder for the multiplicand
multiplicandLo: .BYTE 0
multiplicandHi: .BYTE 0
multiplierLo: .BYTE 0
; create placeholder for RNG seed
seedLo: .BYTE 0
seedHi: .BYTE 0
; create temporary int 16
tempLo: .BYTE 0
tempHi: .BYTE 0

;
; multiply16x8 goes here (code in earlier post)
;

pcg16bit:
; load in the current value of the seed into the multiply registers
LDA seedLo
LDY seedHi
; now multiply by the A constant
LDX #LCG_A
JSR multiply16x8
; add the C constant to the reult
CLC
ADC #LCG_C
BCC pcg16bit_update_seed
INY
pcg16bit_update_seed:
; save the results in the seed
STA seedLo
STY seedHi

; store the temp result bytes
STA tempLo
STY tempHi

; build the shift value to determine how much to shift
TYA
ROL A
ROL A
ROL A
AND #3
CLC
ADC #3

; perform the shift
TAX
pcg16bit_shifting_loop:
LSR tempHi
ROR tempLo
DEX
BNE pcg16bit_shifting_loop

; put results in accumulator for return value
LDA tempLo
RTS

test:
;
; testing code from last post goes here
; jsr changed to pcg16bit
;
\end{lstlisting}
}

If you take a look at the randograms from the LCG and the PCG, you can easily see that the PCG appears to be more random than the lcg version.

Before

After

Now that we have our random number generator we can create a version of Snake for the 2600. But before we can start on the snake game, we need to have a 2600 emulator, so next fortnight it is TIA time!