## Homework Assignment 11

Two exercises.  Due 2:45pm before lecture 11/28/2007.

### Goals

In this assignment, you will experimentally determine more parameters of a computer's cache and TLB, and will analyze an alternative to a one-level page table.

### Submission instructions

Submit your solution by creating a directory named hw11 that contains a file copied from template.txt. From within that directory, type submit hw11 Note that we will be autograding this assignment (except for explanations of course) in the stye of the quizzes, and you should format your template.txt accordingly.

This is not a partnership assignment. Hand in your own work.

### Background

The program cache.c produces timing data for accesses of elements in an array. Here is its main loop:

```	for (index = 0; index < limit; index = index + stride) {
x[index] = x[index] + 1;
}
```

The surrounding code coordinates the setting of the variables limit and stride and the timing of the loop.

Each loop/stride test is repeated many times and the results are averaged. The same loops are run without the memory access to measure the time taken with loop overhead. The program subtracts the overhead time from the loop time so the results should show the time spent on memory access only. Unfortunately, some of our compilers don t compile the real loop and the dummy loop in the same way, so your times may be exaggerated. You don't have to fix this.

The program prints three values for each test, labeled as size, stride, and read+write.

• The size value refers to the size, in bytes, of the array you are testing with, not the size of the cache. (You'll have to figure out the cache size for yourself.)
• The stride value says how many bytes are being skipped at each access. For example, if the stride is 4, bytes 0, 4, 8, 12, ... in the array are being accessed with bytes 1, 2, 3, 5, 6, 7, 9, 10, 11, ... being skipped.
• The read+write value is the amount of time needed to make the access to memory, in nanoseconds. You'll have to figure out for yourself whether these times denote hits or misses. As explained above, the calculated times may be exaggerated if the compiler doesn't make equivalent machine code for the two loops that process the array.

### Exercise 1 (6 points)

The file nova.results lists the results of a run of a slightly modified cache.c on a computer named nova run by the EECS instructional computing staff in 2002. You are to determine answers to the following questions. Make sure to briefly justify all your claims. You may or may not wish to attempt to answer this list in order.

1. Approximately how fast is an L1 cache hit in nanoseconds?
2. Approximately how fast is an L1 cache miss (which hits in L2) in nanoseconds?
3. What is the size in bytes of the L1 cache?
4. What is the block size in bytes of the L1 cache?
5. What is the set associativity of the L1 cache? (use 1 for direct mapped, and the number of blocks for  fully associative)
6. How big is the L2 cache in bytes?
7. What is the page size in bytes? (Time values greater than 100ns indicate TLB misses.)
8. How many entries are there in the TLB?
9. What is the replacement policy of the TLB?

### Exercise 2 (P&H exercise 7.43; 2 points)

Page tables require fairly large amounts of memory (as described in the Elaboration on page 519 of P&H), even if most of the entries are invalid.

One solution is to use a hierarchy of page tables. The virtual page number, as described in Figure 7.20 on page 513, can be broken up into two pieces, a "page table number" and a "page table offset." The page table number can be used to index a first-level page table that provides a physical address for a second-level page table, assuming it resides in memory (if not, a first-level page fault will occur and the page table itself will need to be brought in from disk). The page table offset is used to index into the second-level page table to retrieve the physical page number.

One obvious way to arrange such a scheme is to have the second-level page tables occupy exactly one page of memory. Assuming a 32-bit virtual address space with 4 KB pages and 4 bytes per page table entry, how many bytes will each program need to use to store the first-level page table (which must always be in memory)? Provide figures similar to Figures 7.19, 7.20, and 7.21 (pages 512-517) that demonstrate your understanding of this idea.