__Virtual Machines__
* Underlying architecture has to cause a trap before it does anything that would 
  interfere with the virtual machine
  - x86 architecture is not properly virtualizable
    * An x86 process can tell whether it is running under supervisor or user 
      state. 
      - If running an OS on the VM, the OS can read the state in a register. It 
        expects supervisor, but since it is running virtualized, it will read user.
    * There are supervisor state instructions that you can execute in user mode 
      without trapping and produce incorrect results 
      - example: popf changes different flags in user state and supervisor state. 
        Traps in neither case.
    * But people still build virtual machines on x86. How?
      - VMware uses "runtime binary translation":
        * Go through code, identify instructions that are not properly 
          virtualized, and rewrite the code to fix the problem.
        * This is done as the code is encountered at runtime, but only once for 
          each segment of code.
* Question: Why not interpret all of the instructions?
  - Performance issue- this would run around 100x slower.
  - We want to run as much natively as we can, and trap on anything that would 
    interfere with the VM.

__Performance Evaluation__
* A little less concerned with performance these days because faster computing 
  is cheaper than it used to be.
* Techniques:
  - Analytic modeling
  - Simulation modeling (hw1)
  - System tuning
  - Design improvement
* To improve performance, you need to know what the current performance is.
* One of the most difficult issues is measurement.
  - Collecting the data is usually the most time-consuming part.
* Measurement
  - Hardware monitoring
    * Connect probes to parts of the system
      - But if you can't get to a part of the chip, then you can't monitor it
      - Need the hardware manufacturer to bring the symbols to the periphery of 
        the chip or to registers
      - Unfortunately, performance connections are usually the first part of a 
        design to be tossed out due to deadlines.
    * Hardware monitors usually have some logic built in for data reduction.
    * Sampling randomly is important.
  - Software monitoring
    * Can sample on interrupts
      - The drawback is that you can never sample in the middle of 
        non-interruptible code.
    * Compiler can build in profiling instructions to compiled code.
    * Microcoded machine - could instrument microcode if the computer allows it.
    * Overhead is an issue - need to make sure the monitoring doesn't dominate - 
      this will mess up the performance analysis
    * Examples: IBM's GTF, invoked on traps and interrupts to dump data to a 
      file or tape. Designed for debugging.
  - Hardware counters
    * Built-in counters than can be used to count certain values, and can be 
      read at different intervals
      - Can count, for example: cache misses, main memory accesses, types of 
        instructions, numbers of branches
* Workload Characterization
  - Need to know what sort of workload will be run on a system to measure 
    performance.
  - Executable workloads: Collect a sample of actual workloads to test with. 
    Must assume that its representative.
  - Synthetic executable workload: Make a reasonable guess about what would be a 
    representative workload.
  - Synthetic is required if a new system is being tested for which no real 
    workloads yet exist.
  - How to characterize a workload:
    * Know your parameters.
* Statistical methods
  - Not necessarily as important as understanding how the system works. 
  - Can usually get by with some simple statistical tools: means, variances, 
    distributions, factor analysis, some modeling.
* Analytic Models
  - Markov chains
  - Other stochastic processes
  - Queueing theory models
    * Queueing theory goes back ~100 years. Was driven by telecommunications.
    * Problems in queing theory (as in many other fields) fit a "hockey stick 
      curve" with problem difficulty vs. difficulty to solve.
      - easy to intermediate problems solvable without much work.
      - as soon as the problems get somewhat difficult, the difficulty to solve 
        goes up pretty quickly.
    * Queueing networks
      - HW1 was a very simple queing network with one server, one queue.
      - BCMP type queueing networks can be solved with some boilerplate 
        (but exponential in the number of states):
        * Different customer types
        * Transition probabilities between servers and types
        * Different server types allowed: 
          - FCFS exponential (runtime of serve is exponential); models I/O
          - Processor sharing (each of N servers gets 1/N of processing power); 
            models CPU
          - Infinite server (each customer gets the full rate of the server); 
            models customer thinking
* Discrete Event Simulation
  - Events can come from a trace of a real system, with time and type information.
  - Or, generate events with a random number generator as in HW1
* Continuous Simulation
  - Not usually used for computer system modeling
  - Example: Find the area of some amoeba-looking curve-bounded region using a 
    stochastic method
    * Bound it with a box with known aread, sample random points in the box. The 
      proportion of points that fall in the boundary of the curve multiplied by 
      the box's area is the region's area.
    * This is a Monte Carlo sampling method.
  - Continuous simulation used to solve non-CS problems (ex. aerodynamics 
    problems).
    * Can solve differential equations numerically rather than symbolically.
* Regenerative Simulation
  - Find a regeneration point in the model.
  - We can restart the simulation from this point with different events from 
    this point
  - IID (independent, identically distributed) events
* Back of the Envelope Calculations
  - Sufficient in many cases. Use common-sense information to approximate the 
    answer to some question.
  - Ex: How much water flows down the Mississippi every day?