Bill's Homebrew and Game Jam Blog: Homebrew

Showing posts with label Homebrew. Show all posts

Wednesday, January 3, 2018

Assembling the Tokens

Assemblers are easier to write than a complier would be as the structure of the code is a lot simpler. There are still a lot of similarities between a compiler and an assembler that some of the common techniques used to write a compiler are also useful for writing an assembler. With a compiler you first tokenized the source code (Lexical Analysis), parse the tokens into something meaningful (syntax and semantic analysis), generate some intermediate code which then finally generate the machine code. With the assembler, I am taking a similar approach. I am tokenizing the source code, parsing it into intermediate machine language, then generating the final machine language.

Java does have some tokenizer classes that will do part of what I want done but while it is possible to use Java classes in Kotlin I am not sure how well those will work if I try to compile my Kotlin code to JavaScript, which is something I do want to do eventually. For this reason, and my NIH syndrome kicking in, I opted to write my own tokenizer for the assembler. I created a simple enumeration to hold the different types of tokens that my assembler will use, though suspect that I will be adding some new ones later.

enum class AssemblerTokenTypes {

DIRECTIVE, ERROR, IMMEDIATE,

INDEX_X, INDEX_Y,

INDIRECT_START, INDIRECT_END

LABEL_DECLARATION, LABEL_LINK,

NUMBER, OPCODE, WHITESPACE }

A DIRECTIVE is a command for the assembler. There are a number of different ways that these can be handled but I am going to start my directives with a dot. I have not worked out all the directives that I plan on supporting but at a minimum I will need .ORG, .BANK, .BYTE and .DATA with .INCLUDE and .CONST being nice to have. More on these when I actually get to the directives portion of my assembler.

ERROR is an indication of a tokenization error which kind of breaks the assembling of the file. Invalid characters used would be the likely culprit.

Some of the 6502 instructions have an immediate mode that lets the programmer specify a constant value to use in the next byte. This is indicated by prefacing the constant with the hash (#) symbol. The tokenizer simply indicates that an immediate mode value is going to be next by having an IMMEDIATE token.

The 6502 supports offsets of an address using “,X” or “,Y” so the tokens indicate such an index is being used. These indexes are used for zero page indexing, indirect indexing, as well as your normal indexing which is officially called absolute indexing. The particular type of indexing address mode that will be used will be determined by the parser which will be covered later.

Indirect addressing modes prefix the address with an open bracket and postfix the address with a closed bracket. To indicate this the INDIRECT_START, and INDIRECT_END tokens are used.

It is certainly possible to write an assembler that does not track the addresses of locations for you but requires you to know all the addresses that you are using but one of the reasons that assemblers were invented was to remove this busywork. This means that we need to have some type of support for labels in our assembler. Most 6502 assemblers will indicate the location within the code by having a label that is followed by a colon at the beginning of the line. This is indicated by the LABEL_DECLARATION token with LABEL_LINK tokens being used for links within the code.

As assembly language revolves around numbers, we obviously need a NUMBER token. This is a special token for processing as I am supporting binary, decimal, and hexadecimal formats for numbers. My Machine Architecture teacher will probably be upset that I am not including support for octal numbers but I never use that format in code so didn’t see the point in adding that. I am using the pretty standard 6502 convention of representing hex numbers by prefixing them with a $ and by prefixing binary numbers with a % symbol. Supporting binary is not vital but very handy to have, especially for a machine like the 2600 where you are doing a lot of bit manipulation.

While I probably should have used the term MNEMONIC instead of OPCODE for the enumeration, I often call the mnemonic an op code even though technically the op code is the actual numeric value that the assembler ultimately converts the mnemonic into. Should I change this in my code, probably. Will I?

Finally, WHITESPACE is the spaces, tabs, and comments. In most assemblers comments are designated with a ; so that works fine for me. Most the time the whitespace characters will be ignored so I could arguably not have a token for whitespace and simply ignore it.

Now that we have the tokens out of the way, we need to write the tokenizer.

Wednesday, November 15, 2017

Disassembling the Disassembler

Writing the disassembler turned out to be even simpler than I expected. I had expected the work to be a bit on the time-consuming part as no matter which route I went with to write this I would need to deal with in 56 different instruction with many of them supporting several address modes. There are various approaches that can be taken for disassembling instructions. For processor architectures such as the Sparc, there are very specific bit patterns that make up the instructions. A look over the instructions clearly shows that this is probably true of the 6502 but with 56 valid instructions and only 256 possible values a simple table approach seemed to be the way to go.

The table approach sets up all the information as a table. By having a function pointer or lambda function in the table, you could also set it up to be able to do the interpretation as well. This isn’t really that inefficient either as it is a simple table lookup which then calls a function that does the interpretation work. The bit approach would be a lot messier and with so few possible outcomes it is not overly cumbersome to create. A more complex processor would be a different story but for this project I will go with the table. Here is the format of the table:

OP Code	The number assigned to this operation. While not technically needed here, it is a good idea to have to make sure the table is complete and it will be needed if an assembler is desired in the future.
Op String	The mnemonic or 3 letter word used to describe the instruction.
Size	How many bytes (1 to 3) the instruction uses.
Address Mode	How memory is addressed.
Cycles	The base number of cycles for the instruction. Things such as crossing page boundaries or whether a branch is taken will add to this value.
Command	The code that handles the interpretation of this instruction.

Disassembling then becomes simply the matter of looking up the instruction then based on the address mode printing out the value or address that it is working with. There are 14 address modes that I came up with as follows:

enum class AddressMode {ABSOLUTE, ABSOLUTE_X, ABSOLUTE_Y, ACCUMULATOR, FUTURE_EXPANSION, IMMEDIATE, IMPLIED, INDIRECT, INDIRECT_X, INDIRECT_Y, RELATIVE, ZERO_PAGE, ZERO_PAGE_X, ZERO_PAGE_Y}

The meaning of the individual values in the enumeration are outlined in the following table. This will become important when the interpretor portion of our emulator starts getting implemented.

ABSOLUTE	Specifies the address that will be accessed directly.
ABSOLUTE_X	The address specified with an offset of the value in the X register.
ABSOLUTE_Y	The address specified with an offset of the value in the Y register.
ACCUMULATOR	The value in the Accumulator is used for the value.
FUTURE_EXPANSION	Unknown address mode as instruction not official. For the instructions that I end up having to implement, this will be changed as necessary.
IMMEDIATE	The value to be used is the next byte.
IMPLIED	The instruction tells you what register(s) it uses and those are what get used.
INDIRECT	Use the address located in the address this points to. So if this was JMP (1234) then the value at 1234 and 1235 would be the address to jump to.
INDIRECT_X	The next byte is a zero page address. The X register is added to this value. That byte and the one following it are then used to form the address to jump to.
INDIRECT_Y	The next byte is a zero page address. It is the low byte and the following zero page byte is the high byte to form the address. The value in the Y register is then added to this address.
RELATIVE	An offset to jump to (relative to the next instruction) if the branch is taken.
ZERO_PAGE	Use a zero page address (0 to 255 so only one byte is needed).
ZERO_PAGE_X	Zero page address with the value of the X register added to it.
ZERO_PAGE_Y	Zero page address with the value of the Y register added to it.

Calculating the addresses is easy but for people use to big endian architectures may be strange. For addresses the first byte is the low order byte followed by the high order byte. This means that the address is first + 256 * second. For branching (relative) the address is the start of the next instruction plus the value passed (-128 to 127).

Next week will be a look at my assembler decision with some hindsight about the process as I am nearly finished the assembler.

Wednesday, September 20, 2017

Emulator Project Starting

I have decided that I am going to attempt to allocate a few hours every Wednesday to work on a homebrew project. While going back to my NES RPG would probably be a popular choice, the 2600Dragons.com project that I did for university has me interested in creating my own emulator. I know that there are many emulators available for pretty much any old system that you can think of so the work here is not really needed. Moreover, the emulators available tend to be pretty well written. Still, creating an emulator from scratch would be a very entertaining project.

My choices for target platforms would obviously be the Atari 2600, the NES, or a custom 8-bit console that I created just for the sake of creating an original emulator. Depending on what happens with my Masters degree, a variant of the third choice may be what ultimately happens but for now I am thinking of writing a JavaScript 2600 emulator. I have already created a rough simulator for the TIA chip, though it does need more work. It is the easiest of the three options to work on, and much of the work here can be translated to the other two ideas if it ever proves successful.

JavaScript is not the best choice of languages for creating an emulator but does have the advantage that it works on the internet. For this reason I am also considering writing the code in C and using an asm.js compiler. Writing the emulator core library in C and compiling to JavaScript would allow me to use the core library in other C projects if I decided to go that route. I haven't used asm.js yet so this would be an interesting experiment.

The project would then be broken down into getting a memory subsystem for loading cartridges working, getting a disassembler working for disassembling the 6502 code on the cartridge into readable assembly language. Once I can get code disassembling then I can implement the emulation of the processor. Get the TIA chip emulated and add some interface code and I will have a rudimentary emulator. This sounds easy but I suspect the path will be a lot harder than I anticipate.

And yes, I do plan on creating a CoffeeQuest2600 game for the 2600 which would run in my emulator.

Friday, August 29, 2014

Hello Again World.

Welcome to my new blog. This is a replacement for my Blazing Games Development blog which due to the fact that Blazing Games Inc. is closing down is no longer appropriate. This will be fairly similar to the Blazing Games blog in that the focus will be on game development and programming. Unlike the old blog, I will not even pretend to keep to a regular schedule for updating this one. As I do plan on continuing my 1GAM (One Game A Month) challenge, I will be posting here at least once a month.

So, at least once a month I will have a postmortem of whatever game I released for that month. Some of those games will be part of my NES RPG project. For those who haven't followed my development blog, this is a project where I am slowly creating a quality role-playing game for the Nintendo Entertainment System (or at least emulators as most people don't have flash cartridges for their NES). This is being done by creating a number of mini-games that explore different facets of the NES and slowly lead to the parts necessary for a full RPG. This is combined with articles explaining the learning and though-process behind the code.

The progress of the NES RPG project will be slow as I am heading back to University to upgrade my credentials to a Bachelor degree. This is also the reason why I am not going with a weekly format. More to the point I am hoping to get part-time work while going to school so my bank account isn't drained too much. So, if you know a Canadian company looking for some programming work (not necessarily Game development, though obviously that would be great as well) let them know about me or me about them. Thanks and I hope people actually read this blog in the future!