Wednesday, January 24, 2018

Generating Machine Language and the Label Problem

Once you know the mnemonic,  value or address to use, and the addressing mode then generating the machine language for a line is trivial. You simply find the mnemonic and look up the addressing mode being used which gives you the op code. The op code is the first number to write followed by the value or address. The value that is written depends on the address mode, but there is an interesting factor here.

Implied and accumulator addressing modes do not have any additional bytes added as the value that the CPU uses is already in the register(s) being manipulated. I do consider the flags to be a register as well as the stack pointer being a register.

Immediate mode, relative mode, and the zero page variants all use a single byte. For relative mode you subtract the next instruction address from the target address to find the offset.  For the other modes, the address is used being stored with the low byte first followed by the high byte. This is known as little endian while storing the bytes in reading order is big endian. 

So, essentially you just need to know the length of the instruction to determine the bytes that get written out to the byte stream. This is what my machine language generator does.

    fun createAssemblyInstruction(opString:String, mode:AddressMode, target:Int):Array<Int> {
        val opCode = getOpcodeWithAddressMode(opString, mode);
        if (opCode == -1)
            return arrayOf()
        val instructionSize = m6502.commands[opCode].size
        if (instructionSize == 1)
            return arrayOf(opCode)
        else if (instructionSize == 2) {
            return arrayOf(opCode, target and 255)
        } else {
            val targetHigh:Int = (target / 256) and 255
            val targetLow = target and 255
            return arrayOf(opCode, targetLow, targetHigh)
        }
    }

This is fine for when you do know the address that a a branch or jump is going to be accessing, but what about people who want to use labels so they don’t have to manually figure out the address of everything? We already said that the labels at the start of a line get the current address of the machine language instructions being written so we simply need to use that. The problem is that we don’t always know the address before it is used, as the following example demonstrates:

           LDX #10
loop:      JSR doSomething
           DEX
           BNE loop
...
doSomething:
...
           RTS

This code clearly does something ten times. However, while the branch for the loop knows where the loop address is, the jump to subroutine (JSR) instruction does not know where it is supposed  to branch to. You could put the subroutine earlier in the code, but you don’t want to run it early so you would need to jump over that code so it doesn’t run and jumping forward would still require knowledge of an address that has not yet been discovered.


So, while we can generate code for known addresses, we have a problem with any code that has forward branches in it. The traditional approach is a two pass approach, but for my assembler I am going with a one and a half pass approach which will be explained next time.

No comments:

Post a Comment