total day span:"0" .
2 Numbers and Strings in C
(4 activities)
2.1 Consider the representation of int values.(9 steps)
2.1.1 (Display page) Representation of integers

Representation of integers

We will now examine standard representations of nonnegative integers, to make clear the concept of a value "fitting in" a memory location. First, we distinguish between a value and the representation of that value, i.e. how it is communicated from one person to another. The integer value 27 may be represented using decimal digits, or Roman numerals (XXVII), or words ("twenty-seven"), or an arithmetic expression (6*3+9). We are accustomed to using the decimal system for representing numbers. In this system, we represent a value with a sequence of decimal digits, each being one of 0, ..., 9. The position of a digit in this sequence is important: 758 represents a different value from 857. In fact, each position is associated with a weight. The weights and the digits are combined arithmetically to form the represented value. Finally, it's a based number system, in that each weighting factor is a power of some base value (also referred to as the radix). In decimal, the base is 10. A decimal representation of an integer M is then a sequence of decimal digits
    dn-1 dn-2 ... d1 d0
such that
    M = dn-1 × 10n-1 + dn-2 × 10n-2 + ... + d1 × 101 + d0 × 100

For example, 758 represents 7 × 102 + 5 × 101 + 8 × 100. Inside a computer, it is more convenient to use a base that's a power of 2, for example,
binary base 2 "digits" are 0 and 1
octal base 8 "digits" are 0, ..., 7
hexadecimal base 16 "digits" are 0, ..., 9, A (representing 10), B (representing 11), C, D, E, F (representing 15)
For example, the value "forty five" is represented by 45 in base 10 (decimal), 101101 in base 2 (binary), 55 in base 8 (octal, and 2D in base 16 (hexadecimal).

45 base 10 = 1 × 25 + 0 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 1 × 20

    = 5 × 81 + 5 × 80
    = 2 × 161 + 13 × 160

2.1.2 (Self Test) Representing values in different bases
You have a value that is represented by A5 in hexadecimal. How is it represented in decimal?
What is the representation of the number from question 1 in ternary (base 3)?
2.1.3 (Display page) A joke that's part of the CS culture

A joke that's part of the CS culture

Click on this link for a number representation joke.
2.1.4 (Display page) Conversion between bases that are powers of 2

Conversion between bases that are powers of 2

Any representation using a base that's a power of 2 can easily be converted to binary, by writing out the individual digits in binary. For example:
      2 base 16 = 0010 base 2
      B base 16 = 1011 base 2
      3 base 16 = 0011 base 2
    2B3 base 16 = 0010 1011 0011 base 2
To convert from binary to a base that's 2a, just write the digits in groups of a and translate each group to its corresponding digit base 2a. For example, we can translate from binary to octal, base 23, as follows:
    001010110011 base 2 = 001 010 110 011 base 2 = 1263 base 8

Notice that:
    001 base 2 = 1 base 8
    010 base 2 = 2 base 8
    110 base 2 = 6 base 8
    011 base 2 = 3 base 8
2.1.5 (Display page) Why powers of 2 are interesting

Why powers of 2 are interesting

For reasons that you would learn about in an electronics course (e.g. EE 40 or EE 42), computer memories are organized into collections of circuits that are either "on" or "off". For storage of numeric information, this naturally suggests the use of base 2, with the binary digit (bit) 1 representing "on" and 0 representing "off". Typical units of computer memory are:
Bits Name* C99 Type ANSI C Type*
8 byte int8_t char
16 halfword* int16_t short
32 word* int32_t int
64 doubleword* int64_t long long
*: Note that a byte is always 8 bits, but the length of a word depends on the architecture. For the architecture we're studying, a word is 4 bytes long. Other architectures have different word sizes, so "int" "short" and "double" may have different lengths on those systems. C99 gives programmers a way to specify the exact size of variables rather than letting them change depending on the system.
With n bits, you can represent 2n different values. Thus with 8 bits, you can represent 256 = 28 different values; 10 bits lets you represent 1024 = 1 Kbi = 210 different values. An int32_t in C99 can take on only a limited number of different values. When we try to construct an integer value that's outside that range, and store it in an int32_t, surprises may result.
2.1.6 (Self Test) Binary fingers
All your life you have been counting in unary (base 1) on your fingers, so even with two hands you can only count to ten. How high can you count on one hand in binary? Can you count from zero to that number quickly? (Watch out for the number 4.)
How high can you count with two hands?
2.1.7 (Display page) Experimenting with conversion between bases

Experimenting with conversion between bases

Given below is a program that, given a binary representation typed by the user, prints the decimal equivalent. It's in the file ~cs61c/code/bin2dec.c. Copy the program to your lab2 directory. The program is missing one statement. Supply it, then test the program. It may help to work through how you would produce its output by hand, if given the digits of a binary representation left to right.
#include <stdio.h>

int main ( ) {
	int n;
	char c;
	while (1) {
		/* Translate a new line to decimal.
		   Assume that the only characters on the line are zeroes and ones. */
		n = 0;
		c = getchar ( );
		if (c == EOF) {
			break;
		}
		while (c == '1' || c == '0') {
			/* The missing statement goes here. */
			c = getchar ( );
		}
		printf ("in decimal is %d\n", n);
	}
	return 0;
}
2.1.8 (Display page) More debugging

More debugging

The program ~cs61c/code/checkPower2.c declares and initializes a variable n, then checks whether its value is a power of 2. Copy the program to your lab2 directory and run it to verify that 32768 is a power of 2 (it's 215). Then test the program on several other values:
  1. 65536 (a power of 2)
  2. 1 (another power of 2)
  3. 32767 (not a power of 2)
  4. 3 (also not a power of 2)
Finally, test the program on the value 2147483649, which is 1 plus 231, the largest power of two that fits in an integer. Single-step through the computation to find out what happens. In the next step, explain what happened and why.
What was the bug, and how did you find it?
student answers
2.2 Use program code to examine internal representations.(8 steps)
2.2.1 (Display page) Printing an int in base 2

Printing an int in base 2

The program below, also in ~cs61c/code/buggy.base2.print.c, is intended to print the base 2 representation of the unsigned value stored in the variable numToPrintInBase2. It doesn't work. Fix the problem.
	#include <stdio.h>
	int main ( ) {
		unsigned int numToPrintInBase2 = 2863311530;	/* alternating 1's and 0's */
		unsigned int exp = 1;
		int k;	
		/* Compute the highest storable power of 2 (2 to the 31th). */
		for (k=0; k<31; k++) {
			exp = exp * 2;
		}
		/* For each power of 2 from the highest to the lowest,
			print 1 if it occurs in the number, 0 otherwise. */
		for (k=31; !(k=0); k--) {
			if (numToPrintInBase2 >= exp) {
				putchar ('1');
				numToPrintInBase2 = numToPrintInBase2 - exp;
			} else {
				putchar ('0');
			}
			exp = exp / 2;
		}
		putchar ('\n');
		return 0;
	}
What were the bugs in buggy.base2.print.c, and how did you find them?
student answers2.2.3 (Display page) Printing with gdb

Printing with gdb

Of course, you don't need to write your own programs to print the contents of memory. For example, gdb includes a general printing facility. The command
    print /representation variable

prints the value of the given variable, using the given representation. Among the choices for representation are the following:
unsigned decimal integer
hexadecimal
octal
binary
For example,
    print /x n
would print the contents of n in hexadecimal. Experiment with this gdb feature, using the simple program below. Set a breakpoint at the return statement, then change values via the print feature you used in the last lab, for example:
    print n = 25022
Simple program:
    int main ( ) {
        int n = 70;
        return 0;
    }
2.2.4 (Display page) The char data type

The char data type

C's char data type is used to store a single byte of information. The name "char" comes from the word "character" since a common use of a char is to store a single ASCII character. This table shows the ASCII encoding scheme. A char is treated by C as just a small integer. One may do arithmetic on char values. When an int is used where a char is expected, the excess leftmost bits are discarded. Use the online ASCII table to answer the questions in the next step.
2.2.5 (Self Test) Experimenting with characters
What gets printed by the following statement?
    putchar (70);

What numerical argument (i.e. an argument without quotes) should be passed to putchar so that the program segment below prints a line containing a 0?
    putchar ( ___ );
    putchar ('\n');

2.2.6 (Display page) Using the od program

Using the od program

The od program interprets its input according to the command line arguments, and prints its interpretation. Recall the file wc.data.txt from yesterday's lab. Here are the results of running od on that file.
od commandod outputnotes
od -b < wc.data.txt
0000000 141 040 142 143 040 144 145 146 012 040 040 147 150 151 040 152
0000020 040 012
0000022
prints each byte in octal (base 8).

od -c < wc.data.txt
0000000   a       b   c       d   e   f  \n           g   h   i       j
0000020      \n
0000022
prints each byte as an ASCII character.
od -d < wc.data.txt

0000000 24864 25187 08292 25958 02592 08295 26729 08298
0000020 08202
0000022
prints each pair of bytes in decimal.
od -x < wc.data.txt
0000000 6120 6263 2064 6566 0a20 2067 6869 206a
0000020 200a
0000022
prints each pair of bytes in hexadecimal (base 16).
The seven-digit hexadecimal value at the start of each line gives the position of the byte in the file. man od will give you more information about how the program works. The program ~cs61c/code/mysteryout apparently produces a blank line as output. Find out what it really prints using od. Use the UNIX "pipe" facility to send the output of mysteryout to od:
$ cp ~cs61c/code/mysteryout .
$ chmod 755 mysteryout   # Run "man chmod" if you're curious what this is
$ mysteryout | od options
What characters were actually printed by the mysteryout program?
student answers2.2.8 (Display page) A powerful idea for CS 61CL

A powerful idea for CS 61CL

In Java, data of different types are strictly separated. In general, one can use only integer operations on integers, only string operations on strings, and so forth. As we descend through layers of abstraction, however, we lose the notion of types. Data in a low-level computing environment are just collections of bits. The meaning of those bits can't be determined just by looking at them, for the same bits can represent a variety of objects: numbers, characters, even instructions! The identical treatment of instructions and data—the stored program concept—was a big breakthrough in the early history of computing. The first computers literally had "hard-wired" instructions; to change the program that the computer was running, one had to rewire the computer! The treatment of instructions as data, and the implementation of a general interpreter for those instructions, enabled programmers to climb to higher levels of programming abstraction.
2.3 Continuing with wc implementations(3 steps)
2.3.1 (Display page) Character and line counting

Character and line counting

In the previous lab, you worked with wc, one of a bunch of handy utilities that consume characters from stdin, do some useful processing, and produce characters for stdout. You can run these from the command shell and type at them, or you can redirect files to them, or you can chain them together in a processing pipeline. Try out the utility wc—remember man wc—by seeing how many lines of code there are in the wc1.c program you debugged in the last lab. wc is most useful on a file, e.g., wc < wc1.c to count the lines, words and characters in the file wc1.c. In the last lab, you debugged code that implemented the word-counting part of wc. Now we'll work on the rest of it. Here's a first step, in ~cs61c/code/wc2.c.
#include <stdio.h>

int main ( ) {
  int lines = 0;	/* Initialize line count */
  char c = getchar( );   /* Declare and init to first char */
  while (c != EOF) {
    if (c == '\n') lines++;
    c = getchar( );
  }
  printf(" %d\n",lines);
}
Modify this to print the character count in addition to the line count. If you're tired of a.out, you can specify the name of the executable file with
    gcc wc2.c -o wc
If you have two versions of an executable file with the same name, the search path will determine which takes priority. Type
    echo $PATH
to display the search path, and
    which wc
to display which of the two versions of wc will be run if you just type
    wc wc1.c

To make a long story short, you should type
    ./wc < wc1.c
to run the version in your own directory if your search path would otherwise select the system version to run.
2.3.2 (Display page) And now a word ...

And now a word ...

Hopefully that was easy. Words are harder. As you scan the input you will need to keep some state to tell if you are at the start of a word, still in the middle of one, at the end, or in the middle of some non-word junk. And what is a word anyway? Here's a starting point, which also gives you an answer to the previous exercise. The code is in ~cs61c/code/wc3.c.
#include <stdio.h>
#include <stdbool.h>

bool startWord(int inword, char c) {
  return true;  // this is a lie
}
bool endWord(int inword, char c) {
  return false; // so it this
}

int main () {
  int lines = 0;                /* Init the counters */
  int bytes = 0;
  int words = 0;
  bool isword = false;
  char c = getchar();           /* Prime the loop */
  while (c != EOF) {
    bytes++;
    if (startWord(isword,c)) { /* Start of a word? */
      words++;
      isword = true;
    } else if (endWord(isword,c)) {
      isword = false;           /* End of a word? */
    }
    if (c == '\n') lines++;
    c = getchar();
  }
  printf(" %d %d %d\n",lines, words, bytes);
}

Note that we included the C99 header file stdbool.h. This header file gives us a bool data type and the true and false keywords. On most systems, bool is just defined to be a char or an int, and true and false are just 1 and 0 respectively, but using these helps improve readability. Your job is to study this code and write the predicates startWord and endWord. Remember from last time that you can do comparisons on characters. In fact this is so common that C has some nice libraries to do it, for example, the ctype library you used last lab, described in Appendix B2 of K&R.
What's EOF, and how is it distinguished from a character value? Describe how you determined your answer.
student answers
2.4 Character arrays and strings(10 steps)
2.4.1 (Display page) C arrays

C arrays

An array in C is a contiguous sequence of values of a common type. The simplest of these is an array of characters. Since an ASCII character can be represented using a single 8-bit byte, an array of chars is just a sequence of bytes. A C array is rather more primitive than an array object, as it does not explicitly record the length, for example, nor does it check that the bounds are respected. It simply holds a sequence of values. (Safe programming takes discipline.) As with arrays in other languages, you access a particular element by specifying an index in square brackets, e.g., dwarf[k]. C arrays are indexed starting at 0, so the first dwarf would be dwarf[0] and the last element of an array of seven dwarves would be dwarf[6].
2.4.2 (Display page) Out-of-bounds array access

Out-of-bounds array access

Experiment with the code below, available in ~cs61c/code/outOfBounds.c. In particular, supply an assignment to an element of a1 that changes an element of a2.
#include <stdio.h>

int main ( ) {
        int a1[4];
        int k;
        int a2[4];
        for (k=0; k<4; k++) {
                a1[k] = k+1;
                a2[k] = 2*(k+1);
        }
        a1[5] = 999;
        a2[5] = 762;
        printf ("k = %d\n", k);
        printf ("elements of a1 = %d %d %d %d\n", a1[0], a1[1], a1[2], a1[3]);
        printf ("elements of a2 = %d %d %d %d\n", a2[0], a2[1], a2[2], a2[3]);
        return 0;
}

Explain why the out-of-bounds array assignments did what they did. Also explain how you found the assignment to a1 that changed an element of a2.
student answers2.4.4 (Display page) Representing character strings

Representing character strings

Sequences of characters appear in many places. Text files are sequences of characters. We read sequences of characters from the terminal. In both cases, we often pay special attention to the line breaks. When you type at the terminal, the enter key typically causes a command to execute. It is conveyed to the machine as a newline ('\n') (or sometimes the two-character combination "\r\n", which stands for carriage return, line feed). In some documents the end-of-line is very important, in others it is immaterial. The C language does not take any position on these issues. You can put any sequence of characters you like into a char array. In C, a string is a particular kind of char array that contains a null character ('\0') that signals the end of the string. The null character need not be the last element of the array. Wherever it appears—and that had better be within the bounds of the array—it terminates the string. (Contrast the null character with the newline ('\n'), which has no special meaning. If you happen to print it, it will advance to a new line, but it isn't regarded as "terminating" a string that contains it.) The use of a terminating character, rather than an explicit length, is a design choice in the language. In C, it is extremely simple to pass around references to strings (called pointers, but we'll talk about that later). You can figure out the length by running down the array until you find the terminator. It also means that if you forget to put in the terminator, functions that operate on the string may run off the end and go wandering around all through memory—not a good thing. (Did we mention that safe programming takes discipline?) Oh yes, it is precisely this ability to access memory that makes C so useful for systems code. There is a wonderful set of library routines for strings (described in Appendix B3 in K&R). You will find that you use these all the time. Check the man page for strlcpy and compare it to strncpy. strlcpy (and strlcat) are safer than their strn* counterparts because they ensure that the resulting strings are always null terminated! (Sometimes useful functions can help when discipline fails to produce safe programming)
2.4.5 (Display page) checkPower2 with strings

checkPower2 with strings

The following code provides a template for a more useful version of your power-of-two checker.
#include <stdio.h>
#include <stdlib.h>
#define MAXLINE 80

int test (unsigned int n) {
/* Replace this with an accurate test. This one is too negative.  */
  return 0;
}

char prompt ( ) {
  printf ("> ");
  return getchar ( );
}

int main ( ) {
  char in[MAXLINE];
  unsigned int n;
  char c;
  int i;
  c = prompt();
  while (c != EOF) {
    i = 0;
    while (c != '\n') {
      in[i++] = c;
      c = getchar();
    }
    in[i] = '\0';
    n = atoi (in);
    if (test (n)) printf ("%u is a power of 2\n", n);
    else printf ("%u is not a power of 2\n", n);
    c = prompt();
  }
}
We have declared in to be a char array of length MAXLINE. (In C, you will want to use #define to provide meaningful symbolic names for constants.) This allocates storage for the array and declares the type of its elements. This is essentially a placeholder for a string. It doesn't have anything in it yet. The function prompt illustrates a string literal. This is a character array of three chars, '>', ' ', and '\0'. It is an array of length three containing a string of length 2. The terminator doesn't count. Note that there is no newline in this string, since the input should appear on the same line as the prompt. This program reads sequences of characters from stdin (the keyboard), creates a string for each line, converts the string to an integer value and tests if the value is a power of 2. It reads the input character by character until it reaches a newline. It uses a very useful function, atoi, which takes a string as input and returns an integer value that the string represents. It will skip over initial whitespace to find a digit. This code is online in the file ~cs61c/code/newCheckPower2.c. Using what you learned in the last lab, fill in the test function so that it correctly reports a result to main. Also, experiment with the program. Are leading and trailing blanks handled correctly? What about nondigits? (The next step asks about the latter case.)
What does atoi return when given as an argument a string containing a nondigit? Suggest a reason for its behavior.
student answers2.4.7 (Display page) Writing strings as lines

Writing strings as lines

The following example shows some useful aspects of strings.
#include <stdio.h>
#include <string.h>

void putline (char str [ ]) {
  int i = 0;
  while (str[i]) putchar (str[i++]);
  putchar ('\n');
}

int main ( ) {
  char msg [ ] = "life is short";
  putline (msg);
  putline ("Eat dessert first");
}
Notice in main that the length of the msg array wasn't provided in the declaration, because the C compiler can infer the length from the length of the initializing string constant. msg is an array of characters holding a string. What size is it? Big enough to hold the string and its terminator. The program also uses a string literal in the last call to putline that is not assigned to any variable. We can pass either of these to putline as a char array. Interestingly, putline doesn't know the length of its argument. It assumes that it is a properly terminated string, so it prints characters until it finds the terminating '\0'. More items of interest:
  • putline uses a common C idiom that takes advantage of the string terminator, the boolean false, and the integer zero all being the same thing. We could have used while (str[i] != '\0') just as well.
  • The index is post-incremented. That is, the value returned by the expression i++ is the value i has before it is incremented. Some programmers view this usage as asking for trouble with off-by-one errors; you should decide for yourself whether to use the less tricky
        while (str[i]) {
            putchar (str[i]);
            i++;
        }
    
    The result is that we don't output the terminator. (It happens to be a non-printing character.) However, we do add a newline.
The code above is in putline.c. Run it (with some modification) to determine the following:
  • Is the length of the msg string what we claimed above? Use the strlen function declared in string.h to find that out.
  • Are there any nonprinting characters in the output? Use od or your wc program to decide this.
2.4.8 (Display page) Reading lines as strings; cat

Reading lines as strings; cat

Let's write a version of the UNIX utility cat, which copies lines from stdin to stdout. We already have the output side. Here's a skeleton of the rest.
#include <stdio.h>

#define MAXLINE 1024      // that should be big enough

int getline(char str[], int len) {
  /* ... you get to write this part */
}

void putline (char str[]) {
  int i = 0;
  while (str[i]) putchar (str[i++]);
  putchar('\n');
}

int main ( ) {
  char in[MAXLINE];
  while (getline(in,MAXLINE) > 0) putline(in);
}
getline is a lot more cautious than putline. It might be reasonable to insist that all strings that are passed to putline are properly terminated. (They aren't strings otherwise!) But input from the outside world has to be considered suspect until proven otherwise. The len argument is the length of the array that is provided to hold the string. The return value is the length of the string that was placed in this array. If a line happens to be longer than the array's capacity (maybe the whole file is a single line!) it should be broken up or truncated. The code is in ~cs61c/code/cat.c. Complete the getline function and validate that it works for keyboard input and for files redirected into it. Your test file should include
  • an empty line
  • a line with fewer than MAXLINE characters
  • a line with exactly MAXLINE characters (modify the initialization of MAXLINE to do this test)
  • a line with more than MAXLINE characters
Compare your version of getline to the one that appears on page 69 of K&R, and defend your preference. In determining which of the two you prefer, consider the following questions: Which one is more elegant (and easier to read)? For which is it easier to convince yourself that it works? What do you think about the use of assignment within the loop predicate?
student answers
Here's another version of getline. How does it compare with the other two for elegance and clarity? Briefly justify your answer.
int getline(char str[], int len) {
  int i;
  char c;
  for (i = 0; i < len; i++) str[i] = '\0'; /* Initialize array to string terminator */
  for (i = 0; i < len; i++) {
    c = getchar();
    if (c == '\n') return i+1;	/* EOL, newline replaced with string terminator */
    if (c == EOF) return i;
    str[i] = c;			/* Add a char to the line */
  }
  str[len-1] = '\0';		/* Line too long; terminate it */
  return len;
}

student answers

Entire day brainstorm answers
""