Wednesday, January 31, 2018

Two Pass Versus One-and-a-half Pass Assembly

The label problem discussed last time is the reason that early assemblers were two pass assemblers. The first pass of assembly simply figures out what address each statement will be on while created a table of labels indicating what address each label referred to. The second pass generates the machine language instructions and replaces the labels with the addresses that were worked out during the first pass. The downside of this approach is you need to process the file twice or generate some type of intermediate assembly language file.

As we process an assembly language file, we know which statements need an address and which label they require. This means that we know where in the machine language output address information needs to be added. Keeping this information in memory is not that hard to do, especially for today’s machines which hold millions of times more memory than the resulting machine language file that we are trying to generate. This approach will require only a single pass over the source assembly language file but will require two passes over the machine language file that is being output.

This means that we would need some type of label class which would hold the label, what the actual address for the label is, and a list of links that use that label. As relative links need to be handled different from a regular link so this starts becoming a bit complicated. If we consider variables, a feature I would like in the future, then things get even more complex. The following enumeration covers the different types of labels that I have come up with.

enum class AssemblerLabelTypes {
    TARGET_VALUE,      // Holds the address or value that the label represents
    HIGH_BYTE,         // treat label as variable and use high byte of it
    LOW_BYTE,          // treat label as variable and use low byte of it
    RELATIVE_ADDRESS,  // for relative addressing mode so distance from target
    ZERO_PAGE_ADDRESS, // low byte of address as there should be no high byte
    ADDRESS            // treat as absolute address
}

This got me thinking that if I stored the label in a hash map then all I would need to do is have a list with a simple structure holding the information. Something like the following:

data class AssemblerLabel(val labelName:String,
                          val typeOfLabel:AssemblerLabelTypes,
                          val addressOrValue: Int,
                          val bank:Int = 0)

This, of course, has a redundant label which is not really necessary and is really just a waste of memory but I just couldn’t bring myself around to removing it at this point. I suppose it is something that should be done, but a label that doesn’t have a name does not seem right. If I refactor this code in the future, I would probably rename this as AssemblerLabelNode so the lack of a label string won’t bother me. That is something for the future, however.

Adding a label now simply becomes finding the map entry with the label name in it, creating an entry if one does not exist and adding the AssemblerLabel onto the list associated with that entry. I am using an ArrayList for my list though a linked list would probably be the more correct structure from a data structures viewpoint.  While ArrayLists seem to perform better, for the usage that the assembler does, the linked list would be as efficient

   fun addLabel(label:AssemblerLabel) {
        // see if already a label with this name
        if ( !labelList.containsKey(label.labelName)) {
            val resultList = ArrayList<AssemblerLabel>();
            resultList.add(label)
            labelList.put(label.labelName, resultList)
        } else {
            val resultList:ArrayList<AssemblerLabel> =  labelList[label.labelName]!!
            resultList.add(label)
        }
    }

This stores the labels during the pass over the source assembly file but still requires a second pass over the list, which will be covered next time.

No comments:

Post a Comment