CS61C Fall 2013 HW5

TA: Jeffrey Dong & Ajay Tripathi

Due Sunday, Oct 27, 2013 @ 23:59:59

Goals

This assignment is designed to give you some practice with caches and parallelism, especially SIMD.

Obtaining the Assignment

To copy the hw5 files into your homework directory, go to your hw directory (type git init if it's not already in a git repository) and then type:

$ git pull ~cs61c/hw/05 master

Place your answers in hw5.txt and rotate_vectorized.c. If you are having issues with this, please consult Lab 1: Git and Additional Git Notes.

Problem 1

The following C program is run with no optimizations on a processor with a 512B direct-mapped cache with block size 64B (assume 4B words):

int i,j,c,stride,array[256];
...
for(i=0; i<10000; i++)
for (j=0; j<256; j+=stride)
c += i%(array[j]);

What are possible miss rates (there may be multiple) if:

  1. stride = 128?
  2. stride = 127?

For each miss rate, explain in 1-2 sentences how that would occur. Hint: array[0] may not be aligned on a block boundary.

Problem 2

You might remember your rotate function from hw2. Well, now we're going to vectorize (SIMD-ize) it! Please change the rotate_vectorized function in rotate_vectorized.c so that it uses SSE Intrinsics (and runs much faster). If you compile and run it, you'll see tests for your rotate_vectorized function. If you need help with SSE Intinsics, see lab 8.

Problem 3

The latter 60% of a program supports operations on multiple cores. You noted the time the entire program took to complete with one core.

Part 1) In the ideal case, what is the overall speedup you'd expect, running the latter portion with:

  1. four cores rather than one?
  2. twelve cores rather than one?
  3. as many as you want?

Part 2) Why might the actual speedups be less than the ideal ones you calculated?