Wednesday, February 14, 2018

Assembling the Assembler

At this point it is now possible to create a functional assembler that is able to take some assembly language and convert it into a machine language program. This is not too difficult so I went ahead and wrote a quick function to be able to do just this. I was not happy with this so next week we will refactor the code we have to make it more appropriate for my desires.

It is no secret that I am going to be writing Atari 2600 games using the assembler I am developing but I also want to support NES programs since after I finish my Coffee Quest 2600 project I would likely return to my NES RPG project. Both these systems have games that perform bank switching to allow for more memory than the system allows. Making this more complex, banks are not necessarily going to be the same size on different systems or even with different cartridges on the same system. I have opted to provide the number of banks and size of a bank as part of my assembleProgram function, but am thinking that I will be changing it once I start writing the directive handler.

    fun assembleProgram(source:ArrayList<String>, numBanks:Int = 1, bankSize:Int = 4096):Int {
        this.bankSize = bankSize;
        for (cntr in 0..(numBanks-1)) {
            banks.add( Array(bankSize, {_-> 0}))

First the banks are set up based on the provided size, right now banks are all the same size and the size is set right away but this will be changing when I refactor the assember. Once we have the storage for holding the resulting machine language we need to set all the addresses to 0. This is something else that will have to change but for now will be fine.
        currentBank = 0;
        assemblyLine = 0;
        assemblyAddress = 0;
        var errorCode = 0;

Looping through the source code is where the action takes place. We only need to do this once with each line being sent to a tokenizer and the tokenized line sent to the parser which will generate the assembly language. This gets added to the bank memory. If there is a problem with the assembling then an exception is generated which we catch and turn into an error code that will be returned.

        for (line in source) {
            verbose("$assemblyLine $line -> ${assemblyAddress.toString(16)}: ", false)
            var tokens = tokenize(line)
            try {
                var ml = parse(tokens)
                if (ml.size > 0)
                    for (data in ml) {
                        banks[currentBank][assemblyAddress++] = data
            } catch (iae:InvalidArgumentException) {
                errorCode = 2;

Once we have finished looping through the source code it is time for our half pass where we put proper addresses in place of the zero place holders. Remember that the list of labels was generated during our sole pass over the source code.

        var errors = linkLabelsInMemory()
        if (errors.size > 0) {
            errorCode = 1;
            for (errmessage in errors)

        return errorCode

That is all there is to writing a basic assembler. With that said, you would want to wrap the assembler around some type of command line or GUI which I have done. The command line I wrote is rather simple at this time with the ability to specify a source file which gets loaded then converted into a ROM file which gets written to disk. The loading code is very simple:

        val assemblyFile = File(fileToAssemble)
        var assemblyList:ArrayList<String> = ArrayList(assemblyFile.readLines())

At the moment, I am only supporting the output of a single bank which is also very simple.

        var byteData =  ByteArray(4096)
        for (cntrRom in 0..assembler.bankSize-1) {
            byteData[cntrRom] = assembler.banks[0][cntrRom].toByte()
        val romFile = File(fileToWrite)

With this we have a functional assembler, but for it to be used for creating 2600 or NES software we need some directives to set the bank origins and to write data. I was going to take a break from the assembler and start working on the emulation of 6502 instructions but decided that directives were needed, if only to make testing easier.

At this point I figured it was time to do a quick refactoring pass over the assembler. I really did not like the way that banking was handled with my assembler. The key then was to come up with a better bank system for the assembler. The quick and dirty system that I cobbled together needs replacement but with what? I need banks where the assembler writes the generated machine language into. Even though banks may be stored consecutively they may not have origins that are consecutive, which is almost certainly the case with any type of bank-switching based scheme. The size of the banks are different for different bank-switching schemes.  There may even be situations where you have different sized banks within a cartridge.

The solution to this is to simply let the assembler specify the bank, size, and origin of whatever bank it is creating defaulting to 4k and an origin of 0. When the assembler hits a .BANK directive it will then switch the current bank to the indicated bank (with optional parameters for the size of the bank and the origin of the bank). As writing the assembly code now becomes a bit trickier, I am going to have a bank class that holds the bank data and tracks where the assembler should be writing it’s next instructions. Writing code that exceeds the bank now becomes a bit of an issue which is easily solved by having an exception.

I have been trying to work on my NIH Syndrome by trying to use existing classes when possible. I was looking over the existing exception classes and noticed that the exceptions I have been using were Java exceptions. This is not necessarily a bad thing, but as I eventually want to have the emulator and support tools run on a web page it is important that I use Kotlin libraries so they will work in all the different targets that I can use. I decided to quickly write my own exception which in Java was easy and is even easier in Kotlin if you are not doing anything more than creating a new exception type. While a bit lengthy, here is the exception code:

class AssemblyException(s:String) : Throwable(s)

All exceptions in Kotlin are inherited from the Throwables class which has a base constructor that takes a string which becomes the message for the exception. I suppose I could get fancy with my exception and have additional information such as the tokens being generated and the source that caused the issue but that is not really needed the way I am handling the assembling of code.

Replacing all the exceptions I used earlier with the new one was quick. The bank class does require two different addresses for each memory location which I decided would be best by the actual bank index being the BankAddress and the CPUAddress being the address that the CPU would use to access that piece of memory. Converting for the two is pretty easy. The tracking of the current address is done using CPU addressing as that makes it easier for the assembler as when it does need an address it is always the CPU address with where it is in the bank not being relevant. The Bank class is for the most part straight-forward so will omit most of the code focusing only on the key functions.

class AssemblyBank(val number:Int, var size:Int = 4096, var bankOrigin:Int = 0) {
    private var storage:ByteArray = ByteArray(size)
    var curAddress:Int = bankOrigin

    fun curBankAddress():Int { return curAddress - bankOrigin}

    fun readBankAddress(offset:Int):Int {…}

Writing data is the core function for this class. It is a matter of writing the byte to the current address in the bank then increasing that address. It is possible for the user to change the address to a value beneath the start of the bank as well as after the bank, and long bits of code can end up overflowing a bank. I am contemplating automatic switching to the next bank but due to the nature of the different bank-switching schemes feel that it is not really a good idea and having an assembling error making the programmer fix the bank issue themselves would be the best approach.

    fun writeNextByte(data:Int):Int {
        val offset = curBankAddress()
        if ((offset < 0) or (offset >= size))
            throw AssemblyException("Writing outside of assembly bank")
        storage.set(offset, data.toByte())
        return curAddress
    fun writeBankByte(offset:Int, data:Int) {…}
    fun writeCPUByte(address:Int, data:Int) {…}

Having the ability to resize a bank is arguably not necessary as banks should be created when they are needed. There is one exception to this general rule that necessitates having a resize function. Assemblers always need to have a bank 0. Because the size of a bank defaults to 4K and that can change once assembly starts, the size of bank 0 needs to be changeable. There could be a restriction limiting this to just bank 0 or added complexity that assembly can’t start until a bank is specified, but it is easy enough to resize a bank so why not support it.

    fun resize(bankSize:Int) {
        if (size == bankSize)
        val oldSize = size
        val oldStorage = storage
        storage = ByteArray(bankSize)
        size = bankSize
        val bytesToCopy = Math.min(oldSize, size);
        for (offset in 0..(bytesToCopy-1))
            storage[offset] = oldStorage[offset]

Because bytes are signed in Java, I opted to use integers instead and treat them like bytes, which is actually faster for many processors due to the way they deal with memory alignment. I am switching to byte arrays for this class but am still using Integers for getting and setting as working with them is much easier and converting between the two shouldn’t add too much overhead. Still, for my testing classes I need an array of integers to compare to the expected results, so

    fun bankToIntArray():Array<Int> {
        val list = Array<Int> (size, {cntr -> storage[cntr].toInt() and 255})
        return list;

With that complete, there is just the matter of replacing the code that references the old bank array and assembly address variables with code that accesses the bank class. The currentBank variable now becomes the holder of information about where the next assembly instruction is going to be written and what has been stored. Directives will switch out the current bank when we are switching banks and will alter the origin as well as address within the bank.

With this change done, we are now ready to start adding the directives. But what directives?

No comments:

Post a Comment