Bill's Homebrew and Game Jam Blog: disassembler

Wednesday, November 22, 2017

Why Write an Assembler?

With the disassembler coming together so easy and a nice table of op codes and their associated mnemonics, it seemed to me that I had everything to write an assembler as well. This was not part of my original plans for an emulator but the idea of having an emulator that was able to take source code and let you edit it while running the program would be nice. When I implement the interpreter I am going to need to write a lot of small chunks of assembly language and step over it to make sure things are working. A built-in assembler would greatly aid this work and would even be very handy for game development.

Looking at the structure for my table, I realized that looking up mnemonics and finding the associated OP Codes would be trivial to implement. A map of mnemonics can be created with each mnemonic having a list of the different variants of the instructions and the associated OP code. If you know the mnemonic and the address mode, the OP code would simply be a traversal of the list. Here is the code for generating the map.

for (inst in m6502.commands) {
    if (mapOfOpCodes.containsKey(inst.OPString))
        mapOfOpCodes[inst.OPString]!!.add(inst)
    else {
        mapOfOpCodes.put(inst.OPString, arrayListOf<M6502Instruction>(inst))
    }
}

Traditional 6502 assembly language is not that difficult to parse. Or at least it does not appear to be. It should be possible to write a simple tokenizer and a simple parser to break down the assembly language and determine which address mode it is. Of course, there are other aspects to assembling such as directives and labels but how hard can they be? As it turns out a bit harder than I expected but not really that hard.

While the assembler was not in my original design goals, with the above code coming into my head while I was still getting the disassembler written, I managed to convince myself that this would be an interesting path to explore. As I write this I have the assembler functioning and a partially implemented solution for labels so it is well on it’s way. Development on it is now slow as I am using most of my spare time to finish my Santa trilogy. The first of the three games will be posted this Friday since even in Canada Black Friday is when the Christmas season begins. Or at least the shopping part.

Next week will be a look at the game which was developed without Adobe Animate CC, though still using the Create.js libraries. If you don’t have money to spend on Creative Cloud but have Action Script or Flash experience then this is actually a viable path. Even without Flash experience, the Create.js library is a good library to use for JavaScript games. More on that next week.

Wednesday, November 15, 2017

Disassembling the Disassembler

Writing the disassembler turned out to be even simpler than I expected. I had expected the work to be a bit on the time-consuming part as no matter which route I went with to write this I would need to deal with in 56 different instruction with many of them supporting several address modes. There are various approaches that can be taken for disassembling instructions. For processor architectures such as the Sparc, there are very specific bit patterns that make up the instructions. A look over the instructions clearly shows that this is probably true of the 6502 but with 56 valid instructions and only 256 possible values a simple table approach seemed to be the way to go.

The table approach sets up all the information as a table. By having a function pointer or lambda function in the table, you could also set it up to be able to do the interpretation as well. This isn’t really that inefficient either as it is a simple table lookup which then calls a function that does the interpretation work. The bit approach would be a lot messier and with so few possible outcomes it is not overly cumbersome to create. A more complex processor would be a different story but for this project I will go with the table. Here is the format of the table:

OP Code	The number assigned to this operation. While not technically needed here, it is a good idea to have to make sure the table is complete and it will be needed if an assembler is desired in the future.
Op String	The mnemonic or 3 letter word used to describe the instruction.
Size	How many bytes (1 to 3) the instruction uses.
Address Mode	How memory is addressed.
Cycles	The base number of cycles for the instruction. Things such as crossing page boundaries or whether a branch is taken will add to this value.
Command	The code that handles the interpretation of this instruction.

Disassembling then becomes simply the matter of looking up the instruction then based on the address mode printing out the value or address that it is working with. There are 14 address modes that I came up with as follows:

enum class AddressMode {ABSOLUTE, ABSOLUTE_X, ABSOLUTE_Y, ACCUMULATOR, FUTURE_EXPANSION, IMMEDIATE, IMPLIED, INDIRECT, INDIRECT_X, INDIRECT_Y, RELATIVE, ZERO_PAGE, ZERO_PAGE_X, ZERO_PAGE_Y}

The meaning of the individual values in the enumeration are outlined in the following table. This will become important when the interpretor portion of our emulator starts getting implemented.

ABSOLUTE	Specifies the address that will be accessed directly.
ABSOLUTE_X	The address specified with an offset of the value in the X register.
ABSOLUTE_Y	The address specified with an offset of the value in the Y register.
ACCUMULATOR	The value in the Accumulator is used for the value.
FUTURE_EXPANSION	Unknown address mode as instruction not official. For the instructions that I end up having to implement, this will be changed as necessary.
IMMEDIATE	The value to be used is the next byte.
IMPLIED	The instruction tells you what register(s) it uses and those are what get used.
INDIRECT	Use the address located in the address this points to. So if this was JMP (1234) then the value at 1234 and 1235 would be the address to jump to.
INDIRECT_X	The next byte is a zero page address. The X register is added to this value. That byte and the one following it are then used to form the address to jump to.
INDIRECT_Y	The next byte is a zero page address. It is the low byte and the following zero page byte is the high byte to form the address. The value in the Y register is then added to this address.
RELATIVE	An offset to jump to (relative to the next instruction) if the branch is taken.
ZERO_PAGE	Use a zero page address (0 to 255 so only one byte is needed).
ZERO_PAGE_X	Zero page address with the value of the X register added to it.
ZERO_PAGE_Y	Zero page address with the value of the Y register added to it.

Calculating the addresses is easy but for people use to big endian architectures may be strange. For addresses the first byte is the low order byte followed by the high order byte. This means that the address is first + 256 * second. For branching (relative) the address is the start of the next instruction plus the value passed (-128 to 127).

Next week will be a look at my assembler decision with some hindsight about the process as I am nearly finished the assembler.