2 Numbers and Strings in C | (4 activities) |
Representation of integersWe will now examine standard representations of nonnegative integers, to make clear the concept of a value "fitting in" a memory location. First, we distinguish between a value and the representation of that value, i.e. how it is communicated from one person to another. The integer value 27 may be represented using decimal digits, or Roman numerals (XXVII), or words ("twenty-seven"), or an arithmetic expression (6*3+9). We are accustomed to using the decimal system for representing numbers. In this system, we represent a value with a sequence of decimal digits, each being one of 0, ..., 9. The position of a digit in this sequence is important: 758 represents a different value from 857. In fact, each position is associated with a weight. The weights and the digits are combined arithmetically to form the represented value. Finally, it's a based number system, in that each weighting factor is a power of some base value (also referred to as the radix). In decimal, the base is 10. A decimal representation of an integer M is then a sequence of decimal digitsdn-1 dn-2 ... d1 d0such that M = dn-1 × 10n-1 + dn-2 × 10n-2 + ... + d1 × 101 + d0 × 100For example, 758 represents 7 × 102 + 5 × 101 + 8 × 100. Inside a computer, it is more convenient to use a base that's a power of 2, for example,
45 base 10 = 1 × 25 + 0 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 1 × 20 = 5 × 81 + 5 × 80 = 2 × 161 + 13 × 160 |
You have a value that is represented by A5 in hexadecimal. How is it represented in decimal?
What is the representation of the number from question 1 in ternary (base 3)? |
A joke that's part of the CS cultureClick on this link for a number representation joke. |
Conversion between bases that are powers of 2Any representation using a base that's a power of 2 can easily be converted to binary, by writing out the individual digits in binary. For example:2 base 16 = 0010 base 2 B base 16 = 1011 base 2 3 base 16 = 0011 base 2 2B3 base 16 = 0010 1011 0011 base 2To convert from binary to a base that's 2a, just write the digits in groups of a and translate each group to its corresponding digit base 2a. For example, we can translate from binary to octal, base 23, as follows: 001010110011 base 2 = 001 010 110 011 base 2 = 1263 base 8Notice that: 001 base 2 = 1 base 8 010 base 2 = 2 base 8 110 base 2 = 6 base 8 011 base 2 = 3 base 8 |
Why powers of 2 are interestingFor reasons that you would learn about in an electronics course (e.g. EE 40 or EE 42), computer memories are organized into collections of circuits that are either "on" or "off". For storage of numeric information, this naturally suggests the use of base 2, with the binary digit (bit) 1 representing "on" and 0 representing "off". Typical units of computer memory are:
|
All your life you have been counting in unary (base 1) on your fingers, so even with two hands you can only count to ten. How high can you count on one hand in binary? Can you count from zero to that number quickly? (Watch out for the number 4.) How high can you count with two hands? |
Experimenting with conversion between basesGiven below is a program that, given a binary representation typed by the user, prints the decimal equivalent. It's in the file ~cs61c/code/bin2dec.c. Copy the program to your lab2 directory. The program is missing one statement. Supply it, then test the program. It may help to work through how you would produce its output by hand, if given the digits of a binary representation left to right.#include <stdio.h> int main ( ) { int n; char c; while (1) { /* Translate a new line to decimal. Assume that the only characters on the line are zeroes and ones. */ n = 0; c = getchar ( ); if (c == EOF) { break; } while (c == '1' || c == '0') { /* The missing statement goes here. */ c = getchar ( ); } printf ("in decimal is %d\n", n); } return 0; } |
More debuggingThe program ~cs61c/code/checkPower2.c declares and initializes a variable n, then checks whether its value is a power of 2. Copy the program to your lab2 directory and run it to verify that 32768 is a power of 2 (it's 215). Then test the program on several other values:
|
What was the bug, and how did you find it? |
Printing an int in base 2The program below, also in ~cs61c/code/buggy.base2.print.c, is intended to print the base 2 representation of the unsigned value stored in the variable numToPrintInBase2. It doesn't work. Fix the problem.#include <stdio.h> int main ( ) { unsigned int numToPrintInBase2 = 2863311530; /* alternating 1's and 0's */ unsigned int exp = 1; int k; /* Compute the highest storable power of 2 (2 to the 31th). */ for (k=0; k<31; k++) { exp = exp * 2; } /* For each power of 2 from the highest to the lowest, print 1 if it occurs in the number, 0 otherwise. */ for (k=31; !(k=0); k--) { if (numToPrintInBase2 >= exp) { putchar ('1'); numToPrintInBase2 = numToPrintInBase2 - exp; } else { putchar ('0'); } exp = exp / 2; } putchar ('\n'); return 0; } |
What were the bugs in buggy.base2.print.c, and how did you find them? |
Printing with gdbOf course, you don't need to write your own programs to print the contents of memory. For example, gdb includes a general printing facility. The commandprint /representation variableprints the value of the given variable, using the given representation. Among the choices for representation are the following:
print /x nwould print the contents of n in hexadecimal. Experiment with this gdb feature, using the simple program below. Set a breakpoint at the return statement, then change values via the print feature you used in the last lab, for example: print n = 25022Simple program: int main ( ) { int n = 70; return 0; } |
The char data typeC's char data type is used to store a single byte of information. The name "char" comes from the word "character" since a common use of a char is to store a single ASCII character. This table shows the ASCII encoding scheme. A char is treated by C as just a small integer. One may do arithmetic on char values. When an int is used where a char is expected, the excess leftmost bits are discarded. Use the online ASCII table to answer the questions in the next step. |
What gets printed by the following statement?
putchar (70); What numerical argument (i.e. an argument without quotes) should be passed to putchar so that the program segment below prints a line containing a 0? putchar ( ___ ); putchar ('\n'); |
Using the od programThe od program interprets its input according to the command line arguments, and prints its interpretation. Recall the file wc.data.txt from yesterday's lab. Here are the results of running od on that file.
$ cp ~cs61c/code/mysteryout . $ chmod 755 mysteryout # Run "man chmod" if you're curious what this is $ mysteryout | od options |
What characters were actually printed by the mysteryout program? |
A powerful idea for CS 61CLIn Java, data of different types are strictly separated. In general, one can use only integer operations on integers, only string operations on strings, and so forth. As we descend through layers of abstraction, however, we lose the notion of types. Data in a low-level computing environment are just collections of bits. The meaning of those bits can't be determined just by looking at them, for the same bits can represent a variety of objects: numbers, characters, even instructions! The identical treatment of instructions and data—the stored program concept—was a big breakthrough in the early history of computing. The first computers literally had "hard-wired" instructions; to change the program that the computer was running, one had to rewire the computer! The treatment of instructions as data, and the implementation of a general interpreter for those instructions, enabled programmers to climb to higher levels of programming abstraction. |
Character and line countingIn the previous lab, you worked with wc, one of a bunch of handy utilities that consume characters from stdin, do some useful processing, and produce characters for stdout. You can run these from the command shell and type at them, or you can redirect files to them, or you can chain them together in a processing pipeline. Try out the utility wc—remember man wc—by seeing how many lines of code there are in the wc1.c program you debugged in the last lab. wc is most useful on a file, e.g., wc < wc1.c to count the lines, words and characters in the file wc1.c. In the last lab, you debugged code that implemented the word-counting part of wc. Now we'll work on the rest of it. Here's a first step, in ~cs61c/code/wc2.c.#include <stdio.h> int main ( ) { int lines = 0; /* Initialize line count */ char c = getchar( ); /* Declare and init to first char */ while (c != EOF) { if (c == '\n') lines++; c = getchar( ); } printf(" %d\n",lines); }Modify this to print the character count in addition to the line count. If you're tired of a.out, you can specify the name of the executable file with gcc wc2.c -o wcIf you have two versions of an executable file with the same name, the search path will determine which takes priority. Type echo $PATHto display the search path, and which wcto display which of the two versions of wc will be run if you just type wc wc1.cTo make a long story short, you should type ./wc < wc1.cto run the version in your own directory if your search path would otherwise select the system version to run. |
And now a word ...Hopefully that was easy. Words are harder. As you scan the input you will need to keep some state to tell if you are at the start of a word, still in the middle of one, at the end, or in the middle of some non-word junk. And what is a word anyway? Here's a starting point, which also gives you an answer to the previous exercise. The code is in ~cs61c/code/wc3.c.#include <stdio.h> #include <stdbool.h> bool startWord(int inword, char c) { return true; // this is a lie } bool endWord(int inword, char c) { return false; // so it this } int main () { int lines = 0; /* Init the counters */ int bytes = 0; int words = 0; bool isword = false; char c = getchar(); /* Prime the loop */ while (c != EOF) { bytes++; if (startWord(isword,c)) { /* Start of a word? */ words++; isword = true; } else if (endWord(isword,c)) { isword = false; /* End of a word? */ } if (c == '\n') lines++; c = getchar(); } printf(" %d %d %d\n",lines, words, bytes); }Note that we included the C99 header file stdbool.h. This header file gives us a bool data type and the true and false keywords. On most systems, bool is just defined to be a char or an int, and true and false are just 1 and 0 respectively, but using these helps improve readability. Your job is to study this code and write the predicates startWord and endWord. Remember from last time that you can do comparisons on characters. In fact this is so common that C has some nice libraries to do it, for example, the ctype library you used last lab, described in Appendix B2 of K&R. |
What's EOF, and how is it distinguished from a character value? Describe how you determined your answer. |
C arraysAn array in C is a contiguous sequence of values of a common type. The simplest of these is an array of characters. Since an ASCII character can be represented using a single 8-bit byte, an array of chars is just a sequence of bytes. A C array is rather more primitive than an array object, as it does not explicitly record the length, for example, nor does it check that the bounds are respected. It simply holds a sequence of values. (Safe programming takes discipline.) As with arrays in other languages, you access a particular element by specifying an index in square brackets, e.g., dwarf[k]. C arrays are indexed starting at 0, so the first dwarf would be dwarf[0] and the last element of an array of seven dwarves would be dwarf[6]. |
Out-of-bounds array accessExperiment with the code below, available in ~cs61c/code/outOfBounds.c. In particular, supply an assignment to an element of a1 that changes an element of a2.#include <stdio.h> int main ( ) { int a1[4]; int k; int a2[4]; for (k=0; k<4; k++) { a1[k] = k+1; a2[k] = 2*(k+1); } a1[5] = 999; a2[5] = 762; printf ("k = %d\n", k); printf ("elements of a1 = %d %d %d %d\n", a1[0], a1[1], a1[2], a1[3]); printf ("elements of a2 = %d %d %d %d\n", a2[0], a2[1], a2[2], a2[3]); return 0; } |
Explain why the out-of-bounds array assignments did what they did. Also explain how you found the assignment to a1 that changed an element of a2. |
Representing character stringsSequences of characters appear in many places. Text files are sequences of characters. We read sequences of characters from the terminal. In both cases, we often pay special attention to the line breaks. When you type at the terminal, the enter key typically causes a command to execute. It is conveyed to the machine as a newline ('\n') (or sometimes the two-character combination "\r\n", which stands for carriage return, line feed). In some documents the end-of-line is very important, in others it is immaterial. The C language does not take any position on these issues. You can put any sequence of characters you like into a char array. In C, a string is a particular kind of char array that contains a null character ('\0') that signals the end of the string. The null character need not be the last element of the array. Wherever it appears—and that had better be within the bounds of the array—it terminates the string. (Contrast the null character with the newline ('\n'), which has no special meaning. If you happen to print it, it will advance to a new line, but it isn't regarded as "terminating" a string that contains it.) The use of a terminating character, rather than an explicit length, is a design choice in the language. In C, it is extremely simple to pass around references to strings (called pointers, but we'll talk about that later). You can figure out the length by running down the array until you find the terminator. It also means that if you forget to put in the terminator, functions that operate on the string may run off the end and go wandering around all through memory—not a good thing. (Did we mention that safe programming takes discipline?) Oh yes, it is precisely this ability to access memory that makes C so useful for systems code. There is a wonderful set of library routines for strings (described in Appendix B3 in K&R). You will find that you use these all the time. Check the man page for strlcpy and compare it to strncpy. strlcpy (and strlcat) are safer than their strn* counterparts because they ensure that the resulting strings are always null terminated! (Sometimes useful functions can help when discipline fails to produce safe programming) |
checkPower2 with stringsThe following code provides a template for a more useful version of your power-of-two checker.#include <stdio.h> #include <stdlib.h> #define MAXLINE 80 int test (unsigned int n) { /* Replace this with an accurate test. This one is too negative. */ return 0; } char prompt ( ) { printf ("> "); return getchar ( ); } int main ( ) { char in[MAXLINE]; unsigned int n; char c; int i; c = prompt(); while (c != EOF) { i = 0; while (c != '\n') { in[i++] = c; c = getchar(); } in[i] = '\0'; n = atoi (in); if (test (n)) printf ("%u is a power of 2\n", n); else printf ("%u is not a power of 2\n", n); c = prompt(); } }We have declared in to be a char array of length MAXLINE. (In C, you will want to use #define to provide meaningful symbolic names for constants.) This allocates storage for the array and declares the type of its elements. This is essentially a placeholder for a string. It doesn't have anything in it yet. The function prompt illustrates a string literal. This is a character array of three chars, '>', ' ', and '\0'. It is an array of length three containing a string of length 2. The terminator doesn't count. Note that there is no newline in this string, since the input should appear on the same line as the prompt. This program reads sequences of characters from stdin (the keyboard), creates a string for each line, converts the string to an integer value and tests if the value is a power of 2. It reads the input character by character until it reaches a newline. It uses a very useful function, atoi, which takes a string as input and returns an integer value that the string represents. It will skip over initial whitespace to find a digit. This code is online in the file ~cs61c/code/newCheckPower2.c. Using what you learned in the last lab, fill in the test function so that it correctly reports a result to main. Also, experiment with the program. Are leading and trailing blanks handled correctly? What about nondigits? (The next step asks about the latter case.) |
What does atoi return when given as an argument a string containing a nondigit? Suggest a reason for its behavior. |
Writing strings as linesThe following example shows some useful aspects of strings.#include <stdio.h> #include <string.h> void putline (char str [ ]) { int i = 0; while (str[i]) putchar (str[i++]); putchar ('\n'); } int main ( ) { char msg [ ] = "life is short"; putline (msg); putline ("Eat dessert first"); }Notice in main that the length of the msg array wasn't provided in the declaration, because the C compiler can infer the length from the length of the initializing string constant. msg is an array of characters holding a string. What size is it? Big enough to hold the string and its terminator. The program also uses a string literal in the last call to putline that is not assigned to any variable. We can pass either of these to putline as a char array. Interestingly, putline doesn't know the length of its argument. It assumes that it is a properly terminated string, so it prints characters until it finds the terminating '\0'. More items of interest:
|
Reading lines as strings; catLet's write a version of the UNIX utility cat, which copies lines from stdin to stdout. We already have the output side. Here's a skeleton of the rest.#include <stdio.h> #define MAXLINE 1024 // that should be big enough int getline(char str[], int len) { /* ... you get to write this part */ } void putline (char str[]) { int i = 0; while (str[i]) putchar (str[i++]); putchar('\n'); } int main ( ) { char in[MAXLINE]; while (getline(in,MAXLINE) > 0) putline(in); }getline is a lot more cautious than putline. It might be reasonable to insist that all strings that are passed to putline are properly terminated. (They aren't strings otherwise!) But input from the outside world has to be considered suspect until proven otherwise. The len argument is the length of the array that is provided to hold the string. The return value is the length of the string that was placed in this array. If a line happens to be longer than the array's capacity (maybe the whole file is a single line!) it should be broken up or truncated. The code is in ~cs61c/code/cat.c. Complete the getline function and validate that it works for keyboard input and for files redirected into it. Your test file should include
|
Compare your version of getline to the one that appears on page 69 of K&R, and defend your preference. In determining which of the two you prefer, consider the following questions: Which one is more elegant (and easier to read)? For which is it easier to convince yourself that it works? What do you think about the use of assignment within the loop predicate? |
Here's another version of getline.
How does it compare with the other two for elegance and clarity? Briefly justify your answer.
int getline(char str[], int len) { int i; char c; for (i = 0; i < len; i++) str[i] = '\0'; /* Initialize array to string terminator */ for (i = 0; i < len; i++) { c = getchar(); if (c == '\n') return i+1; /* EOL, newline replaced with string terminator */ if (c == EOF) return i; str[i] = c; /* Add a char to the line */ } str[len-1] = '\0'; /* Line too long; terminate it */ return len; } |