Wednesday, February 28, 2018

Variable Directives

I was going to post Video Poker on Spelchan.com on Saturday but ended up too busy with other things so decided to post it today. This is a pretty close port to the original, but in the future I may do an enhanced version of the game. It is actually possible to use Create.JS with Kotlin, though it is a bit of work and does have a few quirks, so I may do a HD version of this game in Kotlin.



When I say variables for the assembler, I am not talking about reserving storage for the assembly language program but I am instead talking about setting up some special labels that get replaced with other values defined within the assembly language file. These tend to be constants, but having the ability to change the value of an assembly variable on the fly doesn’t hurt. These are often set up as an include file (something that my assembler will not have) that contains the special addresses that the machine you are assembling code for uses. For instance, you would have variables that define the different TIA register addresses so that when you want to access one of the TIA registers you simply need to use the label for that register instead of remembering a more obscure number.

The existing label mechanism goes a long way towards supporting variables, so all we would need to do is have a declarative add a label with the assigned value. I opted not to do this wanting to insert the value right into the code as the first target machine is the 2600. If I just used labels, the assembler assumes a full address not a zero page address while if I am dealing with the TIA then all the register addresses are zero page addresses as is all the memory (the 2600 only has 128 bytes of RAM). By having the variable macro check a variable list and replace that token with a number token the assembler will optimize instructions that can use zero page addressing. When the processDirective function runs into a label, it simply checks to see if it is a declared variable and if so replaces that label token with the stored token as follows:

if (variableList.containsKey(token.contents)) {
     tokens[indx] = variableList.get(token.contents)!!
}

Processing variables is one thing, but they need to be added before we they can be used. This is very simple to do as we simply make sure that the declarive is in the format .EQU label token with the token being what to replace the label with when it is discovered in our code. I am thinking that for extra safety this should be limited to a number token but for now will leave it open ended. Restricting it to just a number requires just an extra check but having it open lets you have labels as a replacement which may be a desired feature.

"EQU" -> {
     if (indx >= (tokens.size + 1)) {
           throw AssemblyException("Missing parameters for .EQU statement line $assemblyLine")
     }
     if (tokens[indx].type != AssemblerTokenTypes.LABEL_LINK) {
           throw AssemblyException("Missing EQU variable name on line $assemblyLine")
     }
     val varName = tokens[indx].contents
     tokens.removeAt(indx)
     variableList.put(varName, tokens[indx])
     tokens.removeAt(indx)
}

This is tested easily enough with the following assembly language file:

.EQU ONE 1
.EQU TWO $2
.EQU THREE %11
     LDA #ONE
     LDA #TWO
     STA THREE
     BRK

Related to variables is .HIGH and .LOW directives which are used for getting the high order byte or low order byte of a variable or label. This comes in handy if you are setting up some type of branch in zero page and need the address. You would load the low byte of the address and store it then load the high byte of the address and store it. The test assembly language does the LDA portion of both a variable and a label.

.BANK 0 $1000 1024
.EQU hilow $1234
LDA #.HIGH hilow
hllabel:   LDA #.LOW hilow
     LDA #.HIGH hllabel
     LDA #.LOW hllabel
     BRK

Actually implementing the .HIGH directive is simply the matter of determining if the token after the directive is a number, label or variable. For numbers and variables we replace the token with the high byte of that number. For labels we set it to the number 0 and add a label link telling the linker to add the high byte of the label to this location. The code may seem strange to those of you not use to Kotlin, but one of the nice features of Kotlin is that if and when statements can return a value which can then be assigned to a variable which is quite convenient.

"HIGH" -> {
     if (indx >= tokens.size) {
           throw AssemblyException("Missing parameters for .HIGH statement line $assemblyLine")
     }
     val num = when (tokens[indx].type) {
           AssemblerTokenTypes.NUMBER -> tokens[indx].num
           AssemblerTokenTypes.LABEL_LINK -> {
                val rep = variableList.get(tokens[indx].contents)
                if (rep == null) {
                     addLabel(AssemblerLabel(tokens[indx].contents, AssemblerLabelTypes.HIGH_BYTE, currentBank.curAddress+1, currentBank.number))
                     0
                } else {
                     rep.num
                }
           }
           else -> throw AssemblyException("Missing HIGH variable name on line $assemblyLine")
     }
     tokens[indx] = AssemblerToken(AssemblerTokenTypes.NUMBER, "high", (num / 256) and 255)
}

The implementation of .LOW is fairly similar except using a low label link and storing the low order byte (ANDing the number by 255) in the number.

This covers most of our variable handling needs, with the only thing missing being way of putting data into the machine language file. This will be covered next time.

Wednesday, February 21, 2018

Memory Directives

My original thoughts about assembler directives would be that they would either start a line or be right after a label. After thinking about how I want to use directives in my assembler, I realize that they could also be used for variables and special label handling, so what I am doing is having a process directives function that will be called just after the parser determines if there is a label on the line. This takes the tokens and has the ability to alter the list of tokens if necessary to implement features like variables. For example:

LDA #.HIGH label

Would remove the declaration and label tokens replacing them with a number token. If the label is an address the number would be 0 but the directive processor would add the appropriate link to the label list, while if label is a variable would just have the high order byte of the value of the variable. This will be discussed in more detail in a future article.

The code for processing directives simply loops through the token list looking for directive and label tokens. Label tokens will be processed by the variable system which will be covered next week. Directives are ran through a when block (the Kotlin equivalent of a switch statement) which then takes each directive and processes the command. There are a number of directives that we will be supporting, with my current list being .BANK, .ORG, .EQU, .HIGH, .LOW, .BYTE, .WORD, .JUMPTABLE, .STARTMACRO, .ENDMACRO, .MACRO with additional directives added as necessary. 

The .BANK directive will have the format .BANK number [origin [size]] with bank numbering starting at 0 and going as high as the cartridges memory manager will allow. The following test code shows how the bank directive would be used. Notice that bank 1 and bank 2 overlap the same memory region.

.BANK 0 $1000 1024
     JSR bank1
; would have bank switching code here
     JSR bank2
     BRK
.bank 1 $2000 1024
bank1:     LDA #1
     RTS
.BANK 2 $2000 1024
bank2:
     LDA #2
     RTS

The bank directive is simple to implement, though a bit longer than one would expect. To begin, the number of the bank is required so if the next token isn’t a number then there is an assembly problem.

"BANK" -> {
     if ((indx >= tokens.size) or (tokens[indx].type != AssemblerTokenTypes.NUMBER)) {
           throw AssemblyException("Invalid bank specified $assemblyLine")
     }

Getting the parameters for the bank turned out to be trickier than it should have with the version of the Kotlin compiler that I was using as it does not seem to shortcut ANDconditionals when the first condition fails so will go on to attempt an invalid array index which was checked for in the first part of the if statement. This is why my code is nested so deep within the parameter checking routines. The basic idea here is that we don’t want to alter the size and origin unless those parameters are provided so we flag those changes as false. We then see if the next parameters are numbers and if so assign them to the appropriate variable. This gives us the number of the bank, the origin of the bank, and the size of the bank. Default values are provided for cases where we are accessing a bank that has not been created yet.

     val bankID = tokens[indx].num
     tokens.removeAt(indx)
     var bankOrg = 0
     var orginParamSet = false
     var bankSize = 4096
     var sizeParamSet = false
     if (indx < tokens.size)
           if (tokens[indx].type == AssemblerTokenTypes.NUMBER) {
                bankOrg = tokens[indx].num
                orginParamSet = true
                tokens.removeAt(indx)
                if (indx < tokens.size)
                     if (tokens[indx].type == AssemblerTokenTypes.NUMBER) {
                           bankSize = tokens[indx].num
                           sizeParamSet = true
                           tokens.removeAt(indx)
                     }
           }

Once we have the desired parameters for the bank, we need to process this by adjusting the existing bank if it exists or by creating a new bank at the appropriate slot in the array if it doesn’t exist. While Kotlin has an ensureCapacity function, which would be ideal for adjusting the size of an array list, it doesn’t do anything so manually creating banks was done instead.

     // apply bank directive
     if (banks.size > bankID) {
           currentBank = banks[bankID]
           if ((orginParamSet) and (currentBank.bankOrigin != bankOrg)) {
                currentBank.bankOrigin = bankOrg
                currentBank.curAddress = bankOrg
           }
           if ((sizeParamSet) and (currentBank.size != bankSize))
                currentBank.resize(bankSize)
     } else {
           while(banks.size < bankID) {
                val skippedBank = banks.size
                banks.add(AssemblyBank(skippedBank))
           }
           banks.add(AssemblyBank(bankID, bankSize, bankOrg))
           currentBank = banks[bankID]
     }

It is possible to switch back an forth between banks with this technique, but good assembly language code shouldn’t do that. Ideally the source code should have the banks and their instructions laid out consecutively in the source file. Having the flexibility to switch between banks may be useful in some situations, but is not likely something I will ever take advantage of.

Having code appear at specific locations within a bank is also possible, and for some bank switching schemes may be required. The more common reason for needing to put the generated machine language in specific locations is for data and the vector table. Still, a .ORG directive is necessary and easy to test.

.BANK 0 $1000
                JMP ten
.ORG 4106
ten: LDA #10
                BRK

The implementation of .ORG simply gets the next parameter and makes sure that the address specified is within the range for the current bank. If it is, that is set as the banks current address. I don’t think there is anything special about this so am not going to show the code, but if you want to look at the code feel free to grab it from the GitHub repository https://github.com/BillySpelchan/VM2600 . Next week we will look at variable directives.

Wednesday, February 14, 2018

Assembling the Assembler

At this point it is now possible to create a functional assembler that is able to take some assembly language and convert it into a machine language program. This is not too difficult so I went ahead and wrote a quick function to be able to do just this. I was not happy with this so next week we will refactor the code we have to make it more appropriate for my desires.

It is no secret that I am going to be writing Atari 2600 games using the assembler I am developing but I also want to support NES programs since after I finish my Coffee Quest 2600 project I would likely return to my NES RPG project. Both these systems have games that perform bank switching to allow for more memory than the system allows. Making this more complex, banks are not necessarily going to be the same size on different systems or even with different cartridges on the same system. I have opted to provide the number of banks and size of a bank as part of my assembleProgram function, but am thinking that I will be changing it once I start writing the directive handler.

    fun assembleProgram(source:ArrayList<String>, numBanks:Int = 1, bankSize:Int = 4096):Int {
        this.bankSize = bankSize;
        banks.clear()
        for (cntr in 0..(numBanks-1)) {
            banks.add( Array(bankSize, {_-> 0}))
        }

First the banks are set up based on the provided size, right now banks are all the same size and the size is set right away but this will be changing when I refactor the assember. Once we have the storage for holding the resulting machine language we need to set all the addresses to 0. This is something else that will have to change but for now will be fine.
        currentBank = 0;
        assemblyLine = 0;
        assemblyAddress = 0;
        var errorCode = 0;

Looping through the source code is where the action takes place. We only need to do this once with each line being sent to a tokenizer and the tokenized line sent to the parser which will generate the assembly language. This gets added to the bank memory. If there is a problem with the assembling then an exception is generated which we catch and turn into an error code that will be returned.

        for (line in source) {
            ++assemblyLine
            verbose("$assemblyLine $line -> ${assemblyAddress.toString(16)}: ", false)
            var tokens = tokenize(line)
            try {
                var ml = parse(tokens)
                if (ml.size > 0)
                    for (data in ml) {
                        banks[currentBank][assemblyAddress++] = data
                        addressInMemory++
                    }
            } catch (iae:InvalidArgumentException) {
                errorCode = 2;
            }
        }

Once we have finished looping through the source code it is time for our half pass where we put proper addresses in place of the zero place holders. Remember that the list of labels was generated during our sole pass over the source code.

        var errors = linkLabelsInMemory()
        if (errors.size > 0) {
            errorCode = 1;
            for (errmessage in errors)
                println(errmessage)
        }

        return errorCode
    }

That is all there is to writing a basic assembler. With that said, you would want to wrap the assembler around some type of command line or GUI which I have done. The command line I wrote is rather simple at this time with the ability to specify a source file which gets loaded then converted into a ROM file which gets written to disk. The loading code is very simple:

        val assemblyFile = File(fileToAssemble)
        var assemblyList:ArrayList<String> = ArrayList(assemblyFile.readLines())
        assembler.assembleProgram(assemblyList)

At the moment, I am only supporting the output of a single bank which is also very simple.

        var byteData =  ByteArray(4096)
        for (cntrRom in 0..assembler.bankSize-1) {
            byteData[cntrRom] = assembler.banks[0][cntrRom].toByte()
        }
        val romFile = File(fileToWrite)
        romFile.writeBytes(byteData)

With this we have a functional assembler, but for it to be used for creating 2600 or NES software we need some directives to set the bank origins and to write data. I was going to take a break from the assembler and start working on the emulation of 6502 instructions but decided that directives were needed, if only to make testing easier.

At this point I figured it was time to do a quick refactoring pass over the assembler. I really did not like the way that banking was handled with my assembler. The key then was to come up with a better bank system for the assembler. The quick and dirty system that I cobbled together needs replacement but with what? I need banks where the assembler writes the generated machine language into. Even though banks may be stored consecutively they may not have origins that are consecutive, which is almost certainly the case with any type of bank-switching based scheme. The size of the banks are different for different bank-switching schemes.  There may even be situations where you have different sized banks within a cartridge.

The solution to this is to simply let the assembler specify the bank, size, and origin of whatever bank it is creating defaulting to 4k and an origin of 0. When the assembler hits a .BANK directive it will then switch the current bank to the indicated bank (with optional parameters for the size of the bank and the origin of the bank). As writing the assembly code now becomes a bit trickier, I am going to have a bank class that holds the bank data and tracks where the assembler should be writing it’s next instructions. Writing code that exceeds the bank now becomes a bit of an issue which is easily solved by having an exception.

I have been trying to work on my NIH Syndrome by trying to use existing classes when possible. I was looking over the existing exception classes and noticed that the exceptions I have been using were Java exceptions. This is not necessarily a bad thing, but as I eventually want to have the emulator and support tools run on a web page it is important that I use Kotlin libraries so they will work in all the different targets that I can use. I decided to quickly write my own exception which in Java was easy and is even easier in Kotlin if you are not doing anything more than creating a new exception type. While a bit lengthy, here is the exception code:

class AssemblyException(s:String) : Throwable(s)

All exceptions in Kotlin are inherited from the Throwables class which has a base constructor that takes a string which becomes the message for the exception. I suppose I could get fancy with my exception and have additional information such as the tokens being generated and the source that caused the issue but that is not really needed the way I am handling the assembling of code.

Replacing all the exceptions I used earlier with the new one was quick. The bank class does require two different addresses for each memory location which I decided would be best by the actual bank index being the BankAddress and the CPUAddress being the address that the CPU would use to access that piece of memory. Converting for the two is pretty easy. The tracking of the current address is done using CPU addressing as that makes it easier for the assembler as when it does need an address it is always the CPU address with where it is in the bank not being relevant. The Bank class is for the most part straight-forward so will omit most of the code focusing only on the key functions.

class AssemblyBank(val number:Int, var size:Int = 4096, var bankOrigin:Int = 0) {
    private var storage:ByteArray = ByteArray(size)
    var curAddress:Int = bankOrigin

    fun curBankAddress():Int { return curAddress - bankOrigin}

    fun readBankAddress(offset:Int):Int {…}

Writing data is the core function for this class. It is a matter of writing the byte to the current address in the bank then increasing that address. It is possible for the user to change the address to a value beneath the start of the bank as well as after the bank, and long bits of code can end up overflowing a bank. I am contemplating automatic switching to the next bank but due to the nature of the different bank-switching schemes feel that it is not really a good idea and having an assembling error making the programmer fix the bank issue themselves would be the best approach.

    fun writeNextByte(data:Int):Int {
        val offset = curBankAddress()
        if ((offset < 0) or (offset >= size))
            throw AssemblyException("Writing outside of assembly bank")
        storage.set(offset, data.toByte())
        ++curAddress
        return curAddress
    }
    fun writeBankByte(offset:Int, data:Int) {…}
    fun writeCPUByte(address:Int, data:Int) {…}

Having the ability to resize a bank is arguably not necessary as banks should be created when they are needed. There is one exception to this general rule that necessitates having a resize function. Assemblers always need to have a bank 0. Because the size of a bank defaults to 4K and that can change once assembly starts, the size of bank 0 needs to be changeable. There could be a restriction limiting this to just bank 0 or added complexity that assembly can’t start until a bank is specified, but it is easy enough to resize a bank so why not support it.

    fun resize(bankSize:Int) {
        if (size == bankSize)
            return;
        val oldSize = size
        val oldStorage = storage
        storage = ByteArray(bankSize)
        size = bankSize
        val bytesToCopy = Math.min(oldSize, size);
        for (offset in 0..(bytesToCopy-1))
            storage[offset] = oldStorage[offset]
    }

Because bytes are signed in Java, I opted to use integers instead and treat them like bytes, which is actually faster for many processors due to the way they deal with memory alignment. I am switching to byte arrays for this class but am still using Integers for getting and setting as working with them is much easier and converting between the two shouldn’t add too much overhead. Still, for my testing classes I need an array of integers to compare to the expected results, so

    fun bankToIntArray():Array<Int> {
        val list = Array<Int> (size, {cntr -> storage[cntr].toInt() and 255})
        return list;
    }
}

With that complete, there is just the matter of replacing the code that references the old bank array and assembly address variables with code that accesses the bank class. The currentBank variable now becomes the holder of information about where the next assembly instruction is going to be written and what has been stored. Directives will switch out the current bank when we are switching banks and will alter the origin as well as address within the bank.

With this change done, we are now ready to start adding the directives. But what directives?

Wednesday, February 7, 2018

Linking Labels


As the assembler stands after last week, we have a list of labels but once the first pass of the assembly code is processed we still need to replace the placeholder branching addresses with actual address that the label points to. I have created the function linkLabelsInMemory to handle this. It is a fairly large function so I will just highlight the important parts of how it works. For those of you who wish to browse the full function it is part of the Assembler.kt file. The source is located at https://github.com/BillySpelchan/VM2600 . The GIT repository is not in synch with this blog. In fact, at the time I am posting this entry I have finished all but 3 of the instructions in the 6502 Assembler and am ahead by a couple dozen or so articles. I am writing the articles as I finish the relative work so the articles are written as thing happen, errors and all. While I am way ahead right now things have a way of getting suddenly busy so being ahead is a good thing as that way there won’t be a lull in posts.

The first issue that I really needed to deal with when reaching this stage was how to handle errors. If this point has been reached, there is an assembled file in the assembler’s banks but any branches that are based around labels have 0 as their target address. There is a possiblity that some links are missing and I need a way to handle this. There is also the possiblity of duplicate targets, which might be okay but should issue a warning. I decided to simply create a list of error messages that mix warnings and errors. All the labels that can be processed are, so in theory the resulting binary could be ran even if there were some warning messages, but the warnings are likely indication of problems with the code.

Kotlin has the rather nice feature of protecting the programmer from null pointer issues. It does this by requiring that variables have a value. It is possible to get around this requirement by adding a question mark to the end of the name of the variable type. The compiler doesn’t like using these variables unless the programmer has checked for null or is using the exclamation point operator telling the compiler that the programmer is positive it is not a null pointer at this time. Of course, if the programmer is wrong then you will get a null pointer exception when you run the program but that bug is on the programmer not the compiler.

The processing of links is one of the cases where we need a null. This is because it is possible for an assembly language file to have a label link token without having the matching label declaration token. If there is a label that only has link-type nodes then there is an error. This is why our fist step in linking the labels is to find the target address for that link.

        for ( (label, links) in labelList) {
            var linkTarget:AssemblerLabel? = null
            for (asmlink in links) {
                if (asmlink.typeOfLabel == AssemblerLabelTypes.TARGET_VALUE) {
                    if (linkTarget != null)
                        errorList.add("Warning: ${asmlink.labelName} declared multiple times!")
                    linkTarget = asmlink;
                }
            }

Notice that there is another issue that this pass through the list will detect. What if more than one address is the target? This is a problem as a branch can only go to one address. The solution here is to use the last provided address while adding a warning to the list of errors. Even though I am calling this a warning, because the resulting code is potentially valid, there is a very strong possibility that this was a mistake on the programmer’s part.

If at the end of this process, no target address is found then clearly there is an error. Once we have a target address, which should be the normal result of this, we can go through the list of labels again and process them for what type of link it is. For the most part this is straight forward, with the only non-trivial case is the relative link. This simply uses the address of the instruction after the relative branching instruction as a base address to be subtracted from the target address.

AssemblerLabelTypes.RELATIVE_ADDRESS -> {
                            val baseAddress = asmlink.addressOrValue + 1
                            val offset = (linkTarget.addressOrValue - baseAddress) and 255
                            banks[asmlink.bank][asmlink.addressOrValue] = offset
                        }

Zero page and low bytes are essentially the same with a high byte simply being the page to use. These are calculated from the provided target value as follows:

val targetHigh:Int = (linkTarget.addressOrValue / 256) and 255
val targetLow = linkTarget.addressOrValue and 255

A full address is two bytes and is stored with the low byte first followed by the high byte. The code is not quite ready to handle banks properly so I am going to have to revisit it. I am also thinking of adding a warning for cases where relative links are out of range.

Some of you may have noticed that there is no check for the case where there is a label declaration but no label links to that declaration. While there is the possibility that this is a mistake and the programmer forgot to use a label, it is also possible that the label is there for clarification or for potentially use in the future.

At this point everything we need to assemble a program exists we just need to put it all together, which will be next week.