Wednesday, November 15, 2017

Disassembling the Disassembler

Writing the disassembler turned out to be even simpler than I expected. I had expected the work to be a bit on the time-consuming part as no matter which route I went with to write this I would need to deal with in 56 different instruction with many of them supporting several address modes. There are various approaches that can be taken for disassembling instructions.  For processor architectures such as the Sparc, there are very specific bit patterns that make up the instructions. A look over the instructions clearly shows that this is probably true of the 6502 but with 56 valid instructions and only 256 possible values a simple table approach seemed to be the way to go.

The table approach sets up all the information as a table. By having a function pointer or lambda function in the table, you could also set it up to be able to do the interpretation as well. This isn’t really that inefficient either as it is a simple table lookup which then calls a function that does the interpretation work. The bit approach would be a lot messier and with so few possible outcomes it is not overly cumbersome to create. A more complex processor would be a different story but for this project I will go with the table. Here is the format of the table:

OP Code
The number assigned to this operation. While not technically needed here, it is a good idea to have to make sure the table is complete and it will be needed if an assembler is desired in the future.
Op String
The mnemonic or 3 letter word used to describe the instruction.
How many bytes (1 to 3) the instruction uses.
Address Mode
How memory is addressed.
The base number of cycles for the instruction. Things such as crossing page boundaries or whether a branch is taken will add to this value.
The code that handles the interpretation of this instruction.

Disassembling then becomes simply the matter of looking up the instruction then based on the address mode printing out the value or address that it is working with. There are 14 address modes that I came up with as follows:


The meaning of the individual values in the enumeration are outlined in the following table. This will become important when the interpretor portion of our emulator starts getting implemented.
Specifies the address that will be accessed directly.
The address specified with an offset of the value in the X register.
The address specified with an offset of the value in the Y register.
The value in the Accumulator is used for the value.
Unknown address mode as instruction not official. For the instructions that I end up having to implement, this will be changed as necessary.
The value to be used is the next byte.
The instruction tells you what register(s) it uses and those are what get used.
Use the address located in the address this points to. So if this was JMP (1234) then the value at 1234 and 1235 would be the address to jump to.
The next byte is a zero page address. The X register is added to this value. That byte and the one following it are then used to form the address to jump to.
The next byte is a zero page address. It is the low byte and the following zero page byte is the high byte to form the address. The value in the Y register is then added to this address.
An offset to jump to (relative to the next instruction) if the branch is taken.
Use a zero page address (0 to 255 so only one byte is needed).
Zero page address with the value of the X register added to it.
Zero page address with the value of the Y register added to it.

Calculating the addresses is easy but for people use to big endian architectures may be strange. For addresses the first byte is the low order byte followed by the high order byte. This means that the address is first + 256 * second. For branching (relative) the address is the start of the next instruction plus the value passed (-128 to 127).

Next week will be a look at my assembler decision with some hindsight about the process as I am nearly finished the assembler. 

No comments:

Post a Comment