CS61C Spring 2017 Project 2-1

Due Sunday, Feb 19, 2017 @ 11:59pm

Updates (Check here if there are updates in the future, announced on Piazza first)

Update 1 (2/9, 11:31 PM): li is only required to handle 32-bit signed (not unsigned) integers
Update 2 (2/11, 4:09 PM): Updated name of variable on line 155 of assembler.c
Update 3 (2/13, 9:09 PM): Updated spec's description of neg. The spec now says that we store the negation of $rs in $rt (the opposite was said before). Also, explicitly stated that the function free_table() must be completed src/translate.c.
Update 4 (2/15 9:53 PM): Added new tests and automated diff testing via the Makefile
Update 5 (2/15 11:00 PM): Fixed issue with 'exit' label in negate_array_ref.out

To apply updates, please enter the following:

        git fetch proj2-starter
        git merge proj2-starter/proj2-1 -m "merge proj2 update"

So What Is This About?

In this part of the project, we will be writing an assembler that translates a subset of the MIPS instruction set to machine code. Our assembler is a two-pass assembler similar to the one described in lecture. However, we will only assemble the .text segment. At a high level, the functionality of our assembler can be divided as follows:

Pass 1: Reads the input (.s) file. Comments are stripped, pseudoinstructions are expanded, and the address of each label is recorded into the symbol table. Input validation of the labels and pseudoinstructions is performed here. The output is written to an intermediate (.int) file .

Pass 2: Reads the intermediate file and translates each instruction to machine code. Instruction syntax and arguments are validated at this step. The relocation table is generated, and the instructions, symbol table, and relocation table are written to an object (.out) file.

The Instruction Set

Please consult the MIPS Green Sheet for register numbers, instruction opcodes, and bitwise formats. Our assembler will support the following registers: $zero, $at, $v0, $a0 - $a3, $t0 - $t3, $s0 - $s3, $sp, and $ra. The name $0 can be used in lieu of $zero. Other register numbers (eg. $1, $2, etc.) are not supported.

We will have 22 instructions and 4 pseudoinstructions to assemble, two that are not real pseudoinstructions. The instructions are:

Instruction	Format
Add Unsigned	addu $rd, $rs, $rt
Or	or $rd, $rs, $rt
Xor	xor $rd, $rs, $rt
Set Less Than	slt $rd, $rs, $rt
Set Less Than Unsigned	sltu $rd, $rs, $rt
Jump Register	jr $rs
Shift Left Logical	sll $rd, $rt, shamt
Add Immediate Unsigned	addiu $rt, $rs, immediate
Or Immediate	ori $rt, $rs, immediate
Xor Immediate	xori $rt, $rs, immediate
Load Upper Immediate	lui $rt, immediate
Load Byte	lb $rt, offset($rs)
Load Byte Unsigned	lbu $rt, offset($rs)
Load Word	lw $rt, offset($rs)
Store Byte	sb $rt, offset($rs)
Store Word	sw $rt, offset($rs)
Branch on Equal	beq $rs, $rt, label
Branch on Not Equal	bne $rs, $rt, label
Jump	j label
Jump and Link	jal label
Mult	mult $rs, $rt
Div	div $rs, $rt
Move from $hi	mfhi $rd
Move from $lo	mflo $rd

The pseudoinstructions are:

Pseudoinstruction	Format
Load Immediate	li $rt, immediate
Negate a the value of the register	neg $rt, $rs
Moves the $hi and $lo registers into the arguments	mfhilo $rs, $rt
Loads the first 3 digits of pi into the registers	pi $rs, $rt, $rd

How li works: stores the immediate value (up to 32 bits long, signed) into $rt
How neg works: sets $rt equal to the two's complement negation of $rs
How mfhilo works: sets $rs equal to the contents of $hi and $rt to the contents of $lo
How pi works: sets $rs equal the first digit of pi, $rt to the second, and $rd to the third

Implementation Steps

Please note that your project will be graded on the Raspberry Pi. While you are free to develop on other machines, you need to make sure that your code compiles and runs without errors on the Raspberry Pi before submitting. If you do not, you run the risk of turning in non-compiling code and getting a ZERO on the entire project. To test on Raspberry Pi, run the command
ssh cs61c-xxx@raspi[1-24].cs.berkeley.edu and then test your code on this machine.

Step 0: Obtaining the Files

To make this process go as smoothly as possible, make sure you:

Use the proj2-xxx repositories for this project. PLEASE MAKE SURE THAT YOUR REPO IS PRIVATE!! If you do not set this to PRIVATE, horrible things will happen to you and you will be severely punished. So before continuing, make sure your repo is PRIVATE. Not setting your repo to private, even if by mistake will be seen as an intention to cheat.

Check that you have 'cs61c-staff' as admin in your access management settings.

Enter into the directory of your class account that you would like to hold your proj2-xxx repository.
Once in this directory, run the following:

git clone https://bitbucket.org/mybitbucketusername/proj2-xxx.git
cd proj2-xxx
git remote add proj2-starter https://github.com/61c-teach/sp17-proj2-starter.git
git fetch proj2-starter
git merge proj2-starter/proj2-1 -m "merge proj2 skeleton code"

You can compile you code by typing make. At first, you will get a bunch of -Wunused-variable and -Wunused-function warnings. The warnings tell you that variables/functions were declared, but were not used in your code. Don't worry, as you complete the assigment the warnings will go away.

Step 1: Building Blocks

Finish the implementation of translate_num() in src/translate_utils.c. You should use the library function strtol() (see documentation here). translate_num() should translate a numerical string (either decimal or hexadecimal) into a signed number, and then check to make sure that the result is within the bounds specified. If the string is invalid or outside of the bounds, return -1.

Step 2: SymbolTable

Use the given SymbolTable data structure to store symbol name-to-address mappings in src/tables.c. Multiple SymbolTables may be created at the same time, and each must resize to fit an arbitrary number of entries (so you should use dynamic memory allocation). A SymbolTable struct has been defined in src/tables.h, and you should use the existing implementation. NO CHANGE SHOULD BE MADE TO src/table.h, or you will throw the autograder off. If you would like to add helper functions, please declare the function prototypes at the top of the src/tables.c file.

In add_to_table, you cannot simply store the character pointer that was given, as it could point to a temporary array. You must store a copy of that string instead. You should use the helper functions defined in src/tables.c whenever appropriate. The provided SymbolTable also comes with an INITIAL_SIZE and SCALING_FACTOR to appropriately create a table and also increase the size of the table when trying to add to a table currently at full capacity.

You must make sure to free all memory that you allocate. Please fill out the function free_table() in src/tables.c to implement the functionality to free the memory of an entire table. See the Valgrind section under testing for more information on detecting memory leaks.

At this point, you should take a look at the "Testing" section at the bottom to test your code thus far.

Step 3: Pseudoinstruction expansion

Implement write_pass_one() in src/translate.c, which should perform pseudoinstruction expansion on the load immediate (li), negate (neg), move from hi and lo (mfhilo), get digits of pi (pi) instructions.

The li instruction normally gets expanded into an lui-ori pair. However, an optimization can be made when the immediate is small. If the immediate can fit inside the imm field of an addiu instruction, we will use an addiu instruction instead. Other assemblers may implement additional optimizations, but ours will not.

For the remaining pseudoinstructions, determine a set of TAL MIPS instructions that will carry out the desired tasks without causing any side-effects.

The function write_pass_one() should write all TAL MIPS instructions to the intermediate file (passed in as a parameter). If there is an error, do NOT write anything to the intermediate file, and return 0 to indicate that 0 lines have been written; otherwise, return the number of instructions that have been written.

Friendly reminder: when you are expanding pseudoinstructions to multiple instructions, make sure each line in the .int file only contains one instruction!

Step 4: Instruction Translation

In this step, you will be translating MIPS instructions into machine code. Firstly, implement the functions create_instr_r_type() and create_instr_i_type() located at the bottom of src/translate.c. These functions will return the 32-bit machine code instructions for R-Type and I-Type TAL MIPS instructions respectively.

Next, you will complete the functions write_*(). There are 12 of these functions in total. Luckily, most of the work for each one has been already completed, with the exception of write_branch() (more on this later). For each of these functions, add in error checking, as well as complete the actual 32-bit instruction. For error checking, make sure that you have the correct number of inputs, the registers are valid, and all immediate values are properly translated. If there is an error, then the function should return -1. You may also find the create_instr_r_type() and create_instr_i_type() particularly useful when creating the instructions.

The write_branch() instruction is slightly less complete than the other functions. Firstly, fill in the can_branch_to() function, which takes in the address of the branch instruction, as well as the address of the label being branched to. It should return whether or not the label address can be reached from the the current instruction address. Then, in the write_branch() function, perform error checking, fill in the offset between the the address we branch from and the label address, and complete the actual instruction.

Step 5: Putting It All Together

Implement pass_one() and pass_two() in assembler.c. You may find the use of parse_args() very helpful in this task. Please keep in mind that if you use parse_args(), it MUST be called after the provided line "char* token = strtok(buf, IGNORE_CHARS);" in pass_one() because parse_args() heavily depends on strtok(). This has to do with the way strtok() works and how passing in NULL as an argument for strtok() is dependent on the last successful function call of strtok(). In the first pass, the assembler will strip comments, add labels to the symbol table, perform pseudoinstruction expansion, and write assembly code into an intermediate file. The second pass will read the intermediate file, translate the instructions into machine code using the symbol table, and write it to an output file. Afterwards, the symbol table and relocation table will be written to the output file as well, but that has been handled for you.

Before you begin, make sure you understand the documentation of fgets() and strtok(). It will be easier to implement pass_two() first. The comments in the function will give a more detailed outline of what to do, as well as what assumptions you may make. Your program should not exit if a line contains an error. Instead, keep track of whether any errors have occured, and if so, return -1 at the end. pass_one() should be structured similarly to pass_two(), except that you will also need to parse out comments and labels. The comments have already been skipped for you, and you will find the add_if_label() function useful to handle labels.

As an aside, our parser is much more lenient than an actual MIPS parser. Building a good parser is outside the scope of this course, but we encourage you to learn about finite state automata if you are interested.

Line Numbers and Byte Offsets

When parsing, you will need to keep track of two numbers, the line number of the input file and the byte offset of the current instruction. Line numbers start at 1, and include whitespace. The byte offset refers to how far away the current instruction is from the first instruction, and does NOT include whitespace. You can think of the byte offset as where each instruction will be if the instructions were loaded into memory starting at address 0. See below for an example.

The address of a label is the byte offset of the next instruction. In the example below, L1 has an address of 4 (since the next instruction is lw, whose address is 4) and L2 has an address of 8 (since the next instruction is ori, whose address is 8).

Line #	Input File
1	addiu $t0 $a0 0
2	L1: lw $t1 0($t0)
3	# This is a comment
4	L2:
5	ori $t1 $t1 0xABCD
6	addiu $t1 $t1 3
7
8	bne $t1 $a2 L2

Output File	Byte Offset
addiu $t0 $a0 0	0
lw $t1 0($t0)	4
ori $t1 $t1 0xABCD	8
addiu $t1 $t1 3	12
bne $t1 $a2 label_2	16

Error Handling

If an input file contains an error, we only require that your program print the correct error messages. The contents of your .int and .out files do not matter.

There are two kinds of errors you can get: errors with instructions and errors with labels. Error checking of labels is done for you by add_if_label(). However, you will still need to record that an error has occurred so that pass_one() can return -1.

In pass_one(), errors with instructions can be raised by 1) write_pass_one() or 2) the instruction having too many arguments. In pass_two(), errors with instructions will only be raised by translate_inst(). Both write_pass_one() and translate_inst() should return a special value (0 and -1 respectively) in the event of an error. You will need to detect whether an instuction has too many arguments yourself in pass_one().

Whenever an error is encountered in either pass_one() or pass_two(), record that there is an error and move on. Do not exit the function prematurely. When the function exits, return -1.

For information about testing error message, please see the "Error Message Testing" section under "Running the Assembler".

Step 6: Testing (NO, DO NOT JUST SKIP THIS SECTION.)

You are responsible for testing your code. While we have provided a few test cases, they are by no means comprehensive. In the previous semester's rendition of the project, it was heavily reflected in student grades who and who did not test their code. Fortunately, you have a variety of testing tools at your service.

CUnit

Note: CUnit tests must be compiled on either the hive or the Soda 2nd floor machines, or you will get compilation errors. We have set up some unit tests in test_assembler.c. You can run them by typing:

make test-assembler

You are encouraged to write more tests than the ones that we have given.

Unit testing in C

In order to make automated testing easier for you, we've hooked up a framework called CUnit. You can learn more about CUnit here.

All of your unit tests will live in test_assembler.c. We've provided two example test suites, each containing two tests, along with test suite initialization functions to make your life easier.

What is a "suite"?

A test suite is essentially a collection of tests. To add test suites, you can use the following boilerplate code in test_assembler.c. Two examples of this are already provided.

Creating the Test Suite

You'll need to add the following code in main() once per test suite:

[... inside of main() in test_assembler.c  ...]
[ Replace pSuiteN below with a suitable name, like pSuite5 ]

CU_pSuite pSuiteN = NULL; // replace N with the test number
/* add a suite to the registry */

/* You don't necessarily have to use the same init and clean functions for
 * each suite. You can specify the function names in the next line:
 */
pSuiteN = CU_add_suite("Suite_N", NULL, NULL);
if (NULL == pSuiteN) {
   CU_cleanup_registry();
   return CU_get_error();
}

This is the doucmentation for add_suite:
CU_pSuite CU_add_suite(const char* strName, CU_InitializeFunc pInit, CU_CleanupFunc pClean)
pUnit and pClean are optional. In the 2nd suite that we provided, pUnit is set to init_log_file because you are able to save to a log file (see init_log_file() for details).

Adding Tests to the Suite

You'll need to add the lines below for each test function that you want to add to the suite. In the example below, we are adding the function simple_sample_test to the suite.

[... also inside of main() in test-assembler.c ...]
/* Add test named simple_sample_test to Suite #N */
if (NULL == CU_add_test(pSuiteN, "Simple Test #N", simple_sample_test))
{
   CU_cleanup_registry();
   return CU_get_error();
}

How Tests Are Run

CUnit performs the following actions when running a test suite:

Runs the suite initialization function. In the above code, this function is called init_suite.
Runs all of the tests you added to the suite. In the above example, this runs only the function named simple_sample_test.
Runs the suite cleanup function. In the above code, this function is called clean_suite.

Valgrind

You should use Valgrind to check whether your code has any memory leaks. We have included a file, run-valgrind, which will run Valgrind on any executable of your choosing. If you get a permission denied error, try changing adding the execute permission to the file:

chmod u+x run-valgrind

Then you can run by typing:

./run-valgrind <whatever program you want to run>

For example, you wanted to see whether running ./assembler -p1 input/simple.s out/simple.int would cause any memory leaks, you should run ./run-valgrind ./assembler -p1 input/simple.s out/simple.int.

MARS

Since you're writing an assembler, why not refer to an existing assembler? MARS is a powerful reference for you to use, and you are encouraged to write your own MIPS files and assemble them using MARS.

Warning: in some cases the output of MARS will differ from the specifications of this project. You should always follow the specs. This is because MARS 1) supports more pseudoinstructions, 2) has slightly different pseudoinstruction expansion rules, and 3) acts as an assembler and linker. For example, MARS may expand an addiu with a 32-bit immediate into li followed by an addiu instruction. Not only will the machine code be different, but the addresses of any labels will also be different. Therefore, you should always examine the assembled instructions carefully when testing with MARS.

MARS also supports an assemble-only mode via the command-line (see documentation here). To save assembled output to a text file, run:

mars a dump .text HexText <output location> <input MIPS file>

Diff (this should be your best friend for this project)

diff is a utility for comparing the contents of files. Running the following command will print out the differences between file1 and file2:

diff <file1> <file2>

To see how to interpret diff results, click here. We have provided some sample input-output pairs (again, these are not comprehensive tests) located in the input and out/ref directories respectively. For example, to check the output of running simple.s on your assembler against the expected output, run:

./assembler input/simple.s out/simple.int out/simple.out 
diff out/simple.out out/ref/simple_ref.out

Running the Assembler

First, make sure your assembler executable is up to date by running make.

By default, the assembler runs two passes. The first pass reads an input file and translates it into an intermediate file. The second pass reads the intermediate file and translates it into an output file. To run both passes, type:

./assembler <input file> <intermediate file> <output file>

Alternatively, you can run only a single pass, which may be helpful while debugging. To run only the first pass, use the -p1 flag:

./assembler <-p1> <input file> <intermediate file>

To run only the second pass, use the -p2 flag. Note that when running pass two only, your symbol and relocation table will be empty since labels were stripped in pass_one(), so it may affect your branch instructions.

./assembler <-p2> <intermediate file> <output file>

When testing cases that should produce error messages, you may want to use the -log flag to log error messages to a text file. The -log flag should be followed with the location of the output file (WARNING: old contents will be overwritten!), and it can be used with any of the three modes above.

Error Message Testing

We have provided two tests for error messages, one for errors that should be raised during pass_one(), and one for errors that should be raised during pass_two(). To test for pass_one() errors, assemble input/p1_errors.s with the -p1 flag and verify that your output matches the expected output:

./assembler -p1 input/p1_errors.s out/p1_errors.int -log log/p1_errors.txt
diff log/p1_errors.txt log/ref/p1_errors_ref.txt

To test for pass_two() errors, assemble input/p2_errors.s running both passes:

./assembler input/p2_errors.s out/p2_errors.int out/p2_errors.out -log log/p2_errors.txt
diff log/p2_errors.txt log/ref/p2_errors_ref.txt

Your intermediate and output files (.int and .out files) do NOT need to match the reference output if the input file contains an error.

We have provided a few sample inputs (in the inputs directory), as well as sample outputs (in the out/ref directory). Please note that these tests are by no means comprehensive--they are simply meant to give you an idea as to whether or not your assembler is at least functioning properly.

If you would like to run gdb on the assembler, use the following commands.

        gdb assembler
        run input/[filename].s out/[filename].int out/[filename].out

Automated Diff Testing

For your convenience, we have automated the running of the diff tests. To run your assembler on all of the input files provided, you should use the command:

          make test-diff

If a test fails, all subsequent tests will not be run. To run a specific test, you may run the following command:

          make test-[filename]

For example, if you would like to run your assembler on the file simple.s, you would do

          make test-simple

Just to note, all underscores have been replaced by hyphens in the Makefile. For example, to run the test on pseudo_2.s, the command would be

          make test-pseudo-2

For further debugging, the output files for each test are stored as "out/[filename].int" and "out/[filename].out".

Submission

Did you test thoroughly on the Raspberry PI? If you do not, you risk getting a ZERO on the entire project.

There are two steps required to submit proj2-1. Failure to perform both steps will result in loss of credit:

First, you must submit using the standard unix submit program on the instructional servers. This assumes that you followed the earlier instructions and did all of your work inside of your git repository. To submit, follow these instructions after logging into your cs61c-xxx class account:
```
cd ~/proj2-xxx                            # Or where your shared git repo is
submit proj2-1
```
Once you type submit proj2-1, follow the prompts generated by the submission system. It will tell you when your submission has been successful and you can confirm this by looking at the output of glookup -t.

Additionally, you must submit proj2-1 to your shared BitBucket repository:

cd ~/proj2-xxx                             # Or where your shared git repo is
git add -u                           
git commit -m "project 2-1 submission"  
git tag "proj2-1-sub"                        # The tag MUST be "proj2-1-sub". Failure to do so will result in loss of credit.
git push origin proj2-1-sub                  # This tells git to push the commit tagged proj2-1-sub

Resubmitting

If you need to re-submit, you can follow the same set of steps that you would if you were submitting for the first time, but you will need to use the -f flag to tag and push to BitBucket:

# Do everything as above until you get to tagging
git tag -f "proj2-1-sub"
git push -f origin proj2-1-sub

Note that in general, force pushes should be used with caution. They will overwrite your remote repository with information from your local copy. As long as you have not damaged your local copy in any way, this will be fine.

CS61C Spring 2017 Project 2-1: MIPS Assembler