Once you
know the mnemonic, value or address to
use, and the addressing mode then generating the machine language for a line is
trivial. You simply find the mnemonic and look up the addressing mode being
used which gives you the op code. The op code is the first number to write
followed by the value or address. The value that is written depends on the
address mode, but there is an interesting factor here.
Implied and
accumulator addressing modes do not have any additional bytes added as the
value that the CPU uses is already in the register(s) being manipulated. I do
consider the flags to be a register as well as the stack pointer being a
register.
Immediate
mode, relative mode, and the zero page variants all use a single byte. For
relative mode you subtract the next instruction address from the target address
to find the offset. For the other modes,
the address is used being stored with the low byte first followed by the high
byte. This is known as little endian while storing the bytes in reading order
is big endian.
So,
essentially you just need to know the length of the instruction to determine
the bytes that get written out to the byte stream. This is what my machine
language generator does.
fun createAssemblyInstruction(opString:String, mode:AddressMode,
target:Int):Array<Int> {
val opCode = getOpcodeWithAddressMode(opString, mode);
if (opCode == -1)
return arrayOf()
val instructionSize = m6502.commands[opCode].size
if (instructionSize == 1)
return arrayOf(opCode)
else if (instructionSize == 2) {
return arrayOf(opCode, target and
255)
} else {
val targetHigh:Int = (target / 256)
and 255
val targetLow = target and 255
return arrayOf(opCode, targetLow,
targetHigh)
}
}
This is
fine for when you do know the address that a a branch or jump is going to be
accessing, but what about people who want to use labels so they don’t have to
manually figure out the address of everything? We already said that the labels
at the start of a line get the current address of the machine language
instructions being written so we simply need to use that. The problem is that
we don’t always know the address before it is used, as the following example
demonstrates:
LDX
#10
loop: JSR
doSomething
DEX
BNE
loop
...
doSomething:
...
RTS
This code
clearly does something ten times. However, while the branch for the loop knows
where the loop address is, the jump to subroutine (JSR) instruction does not
know where it is supposed to branch to.
You could put the subroutine earlier in the code, but you don’t want to run it
early so you would need to jump over that code so it doesn’t run and jumping
forward would still require knowledge of an address that has not yet been
discovered.
So, while
we can generate code for known addresses, we have a problem with any code that
has forward branches in it. The traditional approach is a two pass approach,
but for my assembler I am going with a one and a half pass approach which will
be explained next time.
No comments:
Post a Comment