UP ONE LEVEL: ENCM 369 Winter 2004 Course Handouts

ENCM 369: Computer Organization
Class handout: Example of how an assembler works

Author: Steve Norman
Paper copies handed out: Wed Feb 11
Last modified: Mon Feb 9 13:00:33 MST 2004

Contents


Introduction

This handout reviews an example that was presented in lectures on Monday, Feb. 9. It traces the execution of the assembler as it processes an example source code file.

It assumes that the output of the assembler is an object file.

[back to top of document]


Source code

This is the source code for the example:
        .data                         # line  1
foo:    .word   10                    # line  2
        .text                         # line  3
        .globl  bar                   # line  4
bar:    la      $t0, foo              # line  5
        lw      $t1, ($t0)            # line  6
        add     $v0, $zero, $zero     # line  7
loop:   beq     $t1, $zero, quit      # line  8
        add     $v0, $v0, $a0         # line  9
        addi    $t1, $t1, -1          # line 10
        j       loop                  # line 11
quit:   jr      $ra                   # line 12
        .data                         # line 13
        .globl  quux                  # line 14
quux:   .word   42                    # line 15

[back to top of document]


Line-by-line description of assembler actions

Line 1 is
       .data
This tells the assembler that the upcoming line(s) will describe data segment contents. No updates to the data structures are needed.

Line 2 is

foo:    .word   10
This causes the assember to add entries to its list of symbols and to the data segment, as follows:
Assembler data structures after processing Line 2 ...

TEXT SEGMENT        DATA SEGMENT           SYMBOLS
offset  word        offset  word           name    address   global
   0                   0    0x0000_000a    foo     data+0     no
                       4

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address

Line 3 is

       .text
This tells the assembler that the upcoming line(s) will describe text segment contents. No updates to the data structures are needed.

Line 4 is

       .globl  bar
A new entry is needed in the list of symbols. All that is known so far about foo is that it is global.
Assembler data structures after processing Line 4 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0                    0    0x0000_000a    foo     data+0     no
                        4                   bar                yes

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address

Line 5 is

bar:    la      $t0, foo
Now the assembler can fill in an address for foo in the list of symbols. la is a pseudoinstruction that needs two real instructions, one to set up bits 31-16 of $t0 and another to set up bits 15-0 of $t0. These two instructions get added to the text segment. But it isn't known what the address of foo will be in an executable, so the bits 15-0 of each of the two instructions are left as zero and notes are placed in the incomplete information data structure.
Assembler data structures after processing Line 5 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo

Lines 6 and 7 are

        lw      $t1, ($t0)
        add     $v0, $zero, $zero
The assembler can completely translate these two instructions.
Assembler data structures after processing Lines 6 and 7 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000
  12    0x0000_1020
  16

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo

Line 8 is
loop:   beq     $t1, $zero, quit
The assembler adds loop and quit to the list of symbols. The instruction can't be completely translated because the assembler doesn't yet know where quit is. The assembler leaves bits 15-0 of the instruction as zero and puts a note in the incomplete information section.
Assembler data structures after processing Line 8 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000                         loop    text+16    no
  12    0x0000_1020                         quit               no
  16    0x1120_0000
  20

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from text+20 to quit

Lines 9 and 10 are

        add     $v0, $v0, $a0
        addi    $t1, $t1, -1
The assembler can completely translate these two instructions.
Assembler data structures after processing Lines 9 and 10 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000                         loop    text+16    no
  12    0x0000_1020                         quit               no
  16    0x1120_0000
  20    0x0044_1020
  24    0x2129_ffff
  28

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from insn to quit

Line 11 is
        j       loop
This instruction can't be completely translated because the address of loop isn't known. So a note has to be placed in the incomplete information data structure.
Assembler data structures after processing Line 11 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000                         loop    text+16    no
  12    0x0000_1020                         quit               no
  16    0x1120_0000
  20    0x0044_1020
  24    0x2129_ffff
  28    0x0800_0000
  32

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from insn to quit
text+28    replace bits 25-0 of insn with bits 27-2 of addr of loop

Line 12 is
        quit:   jr      $ra
The instruction can be translated completely. Because the line has a label, there is an update to the list of symbols.
Assembler data structures after processing Line 12 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000                         loop    text+16    no
  12    0x0000_1020                         quit    text+32    no
  16    0x1120_0000
  20    0x0044_1020
  24    0x2129_ffff
  28    0x0800_0000
  32    0x03e0_0008
  36

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from insn to quit
text+28    replace bits 25-0 of insn with bits 27-2 of addr of loop

Line 13 is
        .data
This tells the assembler that the upcoming line(s) will describe data segment contents. No updates to the data structures are needed.
Line 14 is
                .globl  quux
The assembler will update the table of symbols.
Assembler data structures after processing Line 14 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4                   bar     text+0     yes
   8    0x8d09_0000                         loop    text+16    no
  12    0x0000_1020                         quit    text+32    no
  16    0x1120_0000                         quux               yes
  20    0x0044_1020
  24    0x2129_ffff
  28    0x0800_0000
  32    0x03e0_0008
  36

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from insn to quit
text+28    replace bits 25-0 of insn with bits 27-2 of addr of loop

Line 15, the last line, is
quux:   .word   42
Assembler data structures after processing Line 15 ...

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4    0x0000_002a    bar     text+0     yes
   8    0x8d09_0000     8                   loop    text+16    no
  12    0x0000_1020                         quit    text+32    no
  16    0x1120_0000                         quux    data+4     yes
  20    0x0044_1020
  24    0x2129_ffff
  28    0x0800_0000
  32    0x03e0_0008
  36

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+16    replace bits 15-0 of insn with offset from insn to quit
text+28    replace bits 25-0 of insn with bits 27-2 of addr of loop

[back to top of document]


Setting up the branch offset

After reading the entire source file, the assembler now passes through the record of incomplete information to see if any incomplete instructions can be completed. In the case of this example, the only possible completion is filling in the offset in the branch instruction. The appropriate word in the text segment is updated and a note is removed from record of incomplete information.
Final state of sssembler data structures

TEXT SEGMENT         DATA SEGMENT           SYMBOLS
offset  word         offset  word           name    address   global
   0    0x3c01_0000     0    0x0000_000a    foo     data+0     no
   4    0x3428_0000     4    0x0000_002a    bar     text+0     yes
   8    0x8d09_0000     8                   loop    text+16    no
  12    0x0000_1020                         quit    text+32    no
  16    0x1120_0003                         quux    data+4     yes
  20    0x0044_1020
  24    0x2129_ffff
  28    0x0800_0000
  32    0x03e0_0008
  36

RECORD OF INCOMPLETE INFORMATION
address    what needs to be done at that address
text+0     replace bits 15-0 of insn with bits 31-16 of addr of foo
text+4     replace bits 15-0 of insn with bits 15-0 of addr of foo
text+28    replace bits 25-0 of insn with bits 27-2 of addr of loop

[back to top of document]


Creating an object file

Once the assembler has finished processing its input it can generate an object file. For our example, the object file would contain the following information, but in a binary file format, not as text.
TEXT SEGMENT (36 bytes)
   0x3c01_0000
   0x3428_0000
   0x8d09_0000
   0x0000_1020
   0x1120_0003
   0x0044_1020
   0x2129_ffff
   0x0800_0000
   0x03e0_0008
DATA SEGMENT (8 bytes)
   0x0000_000a
   0x0000_002a
RELOCATION INFORMATION
   replace bits 15-0 of word at text+0 with bits 31-16 of addr of data+0
   replace bits 15-0 of word at text+4 with bits 15-0 of addr of data+0
   replace bits 25-0 of word at text+28 with bits 27-2 of addr of text+16
SYMBOL TABLE
   bar     text+0
   quux    data+4

[back to top of document]