CS61C Project 2

Sprintf

Due:2006-2-21 @ 23:59
Project TA:Udam <cs61c-td AT imail dot eecs dot berkeley dot edu>
Last Updated:2006-2-10 @ 16:54

Changes

Background

Goals

The purpose of this project is to familiarize you with the details of the MIPS calling convention and enhance your assembly programming skills.

MIPS Stack Management

By now, you should know how to manage the stack when writing functions in MIPS. Let's assume you have a function light_blue, which calls a function dark_blue. The very first thing that light_blue does is to allocate space in the stack for its variables (prolog). In Figure 1 (middle) we can see that it allocates space for 4 words. light_blue eventually calls dark_blue, which allocates space for its own variables. Figure 1 (right) shows that dark_blue has allocated space for 2 words.Appendix A.6 in the P & H book talks about procedure calling in MIPS as well. For sprintf, we will be passing all the variables on the stack as well (and not just the spilled arguments).

stack.0.jpg
stack.1.1.jpg
stack.1.2.jpg
Figure 1: Typical Stack

MIPS Argument Passing

The MIPS function calling convention uses registers $a0-$a3 for passing arguments down to functions. If there are more than four, the remaining arguments are passed on the stack. How is this mixed with the normal stack managing? Each argument gets one word of stack space (Why one word?). Suppose we are trying to write in MIPS assembler a program like this:

int dark_blue (int lvl, int hp, int mp, int str, int agl, int spd, int wit, int vit, int ftg) {
        int a;
        ...
        a = spd;
        ...
}

int light_blue () {
        int c, d, e;
        ...
        dark_blue (c, d, 43, 62, 25, 99, 47, 29, 1);
        ...
}

Procedure dark_blue has nine integer arguments. The first four arguments are placed on $a0-$a3 as expected. The remaining arguments are "spilled" onto the stack as part of the caller's (light_blue) stack frame. In other words, light_blue, not dark_blue, must allocate the space needed for the remaining arguments.

Figure 2 shows this stack management method. light_blue now allocates space in the stack for its variables and the arguments that it will use to call dark_blue. We can see in Figure 2 (middle) that it allocates space for 9 words: 4 for its own consumption, and 5 more for the spilled arguments in calling dark_blue

light_blue eventually calls dark_blue that then allocates space for its own variables. Figure 2 (right) shows that dark_blue has allocated space for 2 words. Note that dark_blue can access to the 5 arguments by accessing to the beginning of the stack reserved by its caller process (light_blue).

stack.0.jpg
stack.2.1.jpg
stack.2.2.jpg
Figure 1: Stack with spilled arguments

The 5 spilled arguments (patterned light-blue stack) are located at the lower addresses of light_blue's stack frame.

That is, light_blue will use 0($sp) to store the argument agl, 4($sp) to store the argument spdy, and so on. Therefore, dark_blue will use 8($sp) to access to the argument agl, 12($sp) to access to the argument spd, and so on.

Note that the first spilled argument is always at the caller's stack frame lowest address, and the last spilled argument at the highest address. You have to be consistent about this so that dark_blue knows which argument is which.:

light_blue:
        # prolog
        addi $sp, $sp, -16  # allocate stack space for 4 words:
                            # $ra, c, d, and e
        sw   $ra, 12($sp)   # save $ra
                            # use 0($sp) to save c ($t0) when needed
                            # use 4($sp) to save d ($t1) when needed
                            # use 8($sp) to save e ($t2) when needed
        # body
        ...

        addi $t0, $0, 3     # assume this holds the value of c
        addi $t1, $0, 4     # assume this holds the value of d
        addi $t2, $0, 5     # assume this holds the value of e

        ...

        # this is the call to dark_blue
        addi $sp, $sp, -20  # allocate stack space for 5 "spilled" words (patterned
                            # light-blue): the args agl, spd, intel, vit, fatigue
        sw   $t0, 20($sp)   # save c before calling dark_blue
        sw   $t1, 24($sp)   # save d before calling dark_blue
        sw   $t2, 28($sp)   # save e before calling dark_blue

        add  $a0, $0, $t0   # arg lvl = c
        add  $a1, $0, $t1   # arg hp = d
        addi $a2, $0, 43    # arg mp = 43
        addi $a3, $0, 62    # arg str = 62
        addi $t0, $0, 25    # arg agl = 25, but no more registers
        sw   $t0, 0($sp)    # so pass on the stack
        addi $t0, $0, 99    # arg spd = 99, but no more registers
        sw   $t0, 4($sp)    # so pass on the stack
        addi $t0, $0, 47    # arg wit = 47, but no more registers
        sw   $t0, 8($sp)    # so pass on the stack
        addi $t0, $0, 29    # arg vit = 29, but no more registers
        sw   $t0, 12($sp)   # so pass on the stack
        addi $t0, $0, 1     # arg ftg = 1, but no more registers
        sw   $t0, 16($sp)   # so pass on the stack
        jal  dark_blue
        addi $sp, $sp, 20   # restore stack space used for calling dark_blue

        ...

        # epilog
        lw   $ra, 12($sp)   # reload return address
        addi $sp, $sp, 16   # restore stack space
        jr   $ra            # return to caller

        ...

dark_blue:
        # prolog
        addi $sp, $sp, -8   # allocate stack space for 2 words:
                            # $ra, a
        sw   $ra, 4($sp)    # save $ra
                            # use 0($sp) to save a when needed

        # body

        ...

        add  $t0, $0, $a1   # get argument hp
        lw   $t1, 20($sp)   # ### (see below)
                            # 8 (dark_blue's frame) + 12 = 20 up on stack
                            # fetched argument vit
        ...

        # epilog
        lw   $ra, 4($sp)    # reload return address
        addi $sp, $sp, 8    # restore stack space
        jr   $ra            # return to caller

The instruction indicated by "###" is the key to understanding the stack method of argument passing. Function dark_blue is referring to a word of stack memory that is from the caller's stack frame. Its own frame includes only the two words 0($sp) and 4($sp).

Your Task

Setup

Copy the contents of ~cs61c/assignments/proj/02 to a suitable location in your home directory.

% cd ~
% cp -r ~cs61c/projs/02 ~/proj2

You should have sprintf.s, test.sprintf.s, and Makefile in ~/proj2. Your implementations must go in sprintf.s. Test files are given to help you out, but they are by no means comprehensive tests. We strongly urge you to create test cases in addition to the ones provided here.

Sprintf

Your task is to implement the C sprintf function (see 'man sprintf') in MIPS, with extended support for Ropes (see below). This is the function prototype for sprintf:

int sprintf (char *outbuf, const char *format, ...)

sprintf works like printf, except that it writes to the string outbuf instead of to standard output. outbuf is assumed already to point to allocated memory sufficient to hold the generated characters. Your function must accept any number of arguments passed according to the MIPS standard convention.

The first argument is the address of a character array into which your procedure will put its results. The second argument is a format string in which each occurrence of a percent sign (%) indicates where one of the subsequent arguments is to be substituted and how it is to be formatted. The remaining arguments are values that are to be converted to printable character form according to the format instructions. sprintf returns the number of characters in its output string not including the null at the end. Each argument will be passed on the stack (including the non-spilled arguments) and the first four arguments (or up to four if there are less) will also be placed in the $a0- $a3 registers. Looking at the test.sprintf.s will help you understand how we will call your sprintf function.

You do not have to do any error checking (e.g. comparing the number of arguments to the number of % specifications). You also do not have to implement all of the formatting options of the real sprintf. Here are the ones you are to implement:

%d: signed integer argument to signed decimal (treat the next 
    argument as a signed integer and output in decimal)
%u: unsigned integer argument to unsigned decimal (treat the next 
    argument as a unsigned integer and output in decimal)
%b: unsigned integer argument to a binary number (treat the next argument
    as an unsigned integer and output in binary)
%c: character argument copied to output (treat the low byte of the 
    argument word as a character and copy it to the output)
%s: include a string of characters in output (treat the argument 
    as a pointer to a null-terminated string to be copied to the output)
%r: rope argument with the leaf strings copied to the output. A rope 
    is another way of representing a string. The exact format for the rope
    structure for this project is outlined below. 

Don't implement width or precision modifiers (e.g., %6d ).

The ' Flag

The ' flag can be inserted for the decimal and unsigned formatting options (%'d and %'u) such that the resulting signed or unsigned number will be printed out with the thousands' grouping character.

For the %d and %u formatting options, the ' flag is inserted between the % and d/u characters. The apostrophe will be used to format the number with thousands grouping characters. Thus, if the number is 10047 is printed out using the %d option, it should be formatted such that 10,047 will be printed out (including the comma) with the %'d formatting option. The files apostrophe_demo.c and apostrophe_demo in ~cs61c/projs/02 will help to explain the %'d and %'u modifier in more detail. The extra ' in between the % and the d and u options will determine if the number should be printed with the throusands grouping characters.

Ropes

Ropes are a different way to represent strings. They are mainly used so that concatenation of strings does not involve copying both sets of strings, and makes it so that concatenation can take constant time. For this project, we will represent ropes such that there are only concatenation nodes and the leaves are always normal C strings.

Each concatenation node will consist of an integer of 128 in the front of the node followed by two pointers that point to the left and right child of the node. If a pointer is null, that means there is no child in that direction. Thus, you should check that each pointer is not null before traversing the children. For this project, assume that each C string in the leaf will be a normal ascii character (and thus represented from 0 - 127) so that you can assume that if the first byte of a child is 128, then it must be a concatenation node and not a C string leaf. Each leaf will simply be a normal C string, and the C strings at the leaves of the tree should be moved to the output buffer. For each rope, you should traverse it from left to right and copy the C strings into the output buffer. For instance, in the image below: the output buffer will contain "The quick brown fox".

ropes.png

For example, the following code calls your sprintf function with the above rope:

        .data
__buffer:
        .space  200
__format:
        .asciiz "%r"
__rope_node1:
        .space 12
__rope_node2:
        .space 12
__rope_s1:
        .asciiz "The qui"
__rope_s2:
        .asciiz "ck brown"
__rope_s3:
        .asciiz " fox"
        ###

        .text

        addi $sp, $sp, -20          # space for the stack of parameters
        la   $a0, __buffer
        sw   $a0, 0($sp)            #  0($sp): __buffer
        la   $a1, __format
        sw   $a1, 4($sp)            #  4($sp): __format
        la   $a2, __rope_node1
        la   $t0, __rope_node2
        la   $t1, __rope_s1
        la   $t2, __rope_s2
        la   $t3, __rope_s3         #  getting the address for each rope node and string
        addi $t4. $0, 128
        sb   $t4, 0($a2)            #  store 128 into the first byte to show that this is a node
        sw   $t0, 4($a2)            #  store the left child (2nd node) into the 1st word of node1
        sw   $t3, 8($a2)            #  store the right child (third string) into the 3rd word of node1
        sb   $t4, 0($t0)            #  store 128 into the first byte of node2
        sw   $t1, 4($t0)            #  store the left child (__rope_s1) into node2's second word
        sw   $t2, 8($t0)            #  store the right child (__rope_s2) into node2's third word
        sw   $a2, 8($sp)            #  8($sp): __rope   (%r)
        jal  sprintf

The above should just place "The quick brown fox" into a0 once sprintf has ran.

Each concat nodes, as shown in the example above, is 12 bytes or three words long and consists of the integer 128, and two pointers pointing to the left child (as the second word) and the third child (as the third word). The leaves of this structure are simple C strings represented as ascii characters.

A more detailed analysis of ropes and their uses can be found here

Miscellaneous Requirements

  • Obey all register conventions in this project: return the number of characters in $v0 and do not use $sx registers without saving them first. In particular, remember that the $ax registers are temporary registers, so they will be clobbered by any function call. Points will be deducted for violations of these conventions. Your sprintf function must work with the main function supplied in test.sprintf.s
  • Addenum to the register conventions - $sp points to the lowest occupied space of the stack. This means you will need to move the stack pointer (i.e. addi $sp, $sp, 4) befor using the 0th offset of the stack pointer (i.e. sw $ra, 0($sp)).
  • You MUST comment your MIPS code. The readers will read it, and they need to be able to quickly understand what every section does. A common style for commenting assembly is to include a long introductory comment before every block of assembly statements, with a short comment after every line telling what it does. The introductory comment must describe the algorithm that the following block implements. The line-by-line comments must just be an easy-to-read version of the assembly code, using real variable names and perhaps more C-like constructs. If your code ends up being at least as much comment as code, this is probably a sign that you are doing a good job of commenting.
  • Put your login name, section, and TA information at the top of sprintf.s.

Submission Details

From within the directory containing your code, run submit proj2. You must include the following files: sprintf.s and README. Additionally, make sure no labels in your solutions begin with '__' (which means don't include a __start, too). The testing framework will use labels that begin with __.

Extra

Misc Notes

  • Minimum-Effort Digit Formatting: For all the number formatting cases, you must minimize the number of digits your sprintf prints. For example, if the user tries to format the number 19 (decimal), your sprintf must format it as "10011" in binary, and not as, say, "00010011".
  • Unsupported Format Specifiers: Your code does not need to handle any cases that are not mentioned here. For example, if you encounter a format string with the sequence %% or %t, you must copy to the output just the last character, "%" or "t" respectively. That is, an unsupported escape sequence must ONLY copy the character following the %. However, your code should not crash under these circumstances. It must continue processing the rest of the string.
  • Spaces between '%' and format char: Apparently, the original sprintf will interpret the format string "Hello % dn" the same as "Hello %dn". However, in this project you will assume that if the % sign is followed by a space, this is an unsupported escape sequence, and therefore you must print the space.
  • Calling Convention: Imagine that you have a subfunction that only your sprintf function calls. Could you violate the calling convention? For example, can you have the subfunction modify $t0 and leave a useful value in it for sprintf? Can you use $s6 knowing that it will not be used by other parts of sprintf? Certainly your code could work and still be violating the calling convention. However, suppose that I want to use your subfunction in my code. If I try this, it won't work. Similarly, suppose I want to insert a different version of the subfunction into your code. This will not work either because your sprintf is executing the subfunction to modify $t0 in a useful way. This may seem unlikely but as the code that you work with gets bigger and bigger, adhering to calling conventions becomes more and more important. So, when you write subfunctions, do as this spec says and OBSERVE THE CALLING CONVENTION. You will be graded on this!!
  • Character Passing: Characters (single characters, not characters in strings) are stored as if they are a 32-bit value instead of a 8-bit value. This means, if they are passed via $a0-$a3, they get their own registers; if they are passed via the stack, they get their own word (i.e. use lw to read a character from the stack).Characters in a string are still one byte each (8-bit value) and the pointer to the first character is what is passed on the stack for a string.