CS 162 Lecture Notes

Prof. Alan Jay Smith


     Topic: Sharing Main Memory -- Segmentation and Paging


     +   How do we allocate memory to processes?


     +   1. Simple uniprogramming with a single segment per process:

         +   One program in memory at a time.  (Can actually multipro-

             gram by swapping programs.)

             +   Highest memory holds OS.

             +   Process is allocated memory starting at 0 (or J),  up

                 to (or from) the OS area.

             +   Process always loaded at 0.

             +   Examples:  early batch monitors where  only  one  job

                 ran  at  a time and all it could do was wreck the OS,

                 which would be rebooted  by  an  operator.   Many  of

                 today's  personal computers also operate in a similar

                 fashion.

         +   Advantages

             +   Low overhead

             +   Simple

             +   No need to do relocation.  Always loaded at zero.

         +   Disadvantages

             +   No protection - process can overwrite OS

                 +   which means it can get complete control of system

             +   Multiprogramming requires swapping entire process  in

                 and out

                 +   Overhead for swapping


                                  - .1 -


                 +   Idle time while swapping

             +   Process limited to size of memory

                 +   CTSS ("compatible" time sharing system), and  how

                     system swapped users completely.

             +   No good way share - only one process at a time (can't

                 even  overlap  CPU and I/O, since only one process in

                 memory.)


     +   2. Relocation - load program anywhere in memory.

         +   Idea is to use loader or linker to load program at an ar-

             bitrary memory address.

         +   Note that program can't be moved (relocated) once loaded.

             (WHY??)

         +   This scheme (#2) is essentially the same as #1,  but  the

             ability to load at any address will be used in #3 below.


     +   3. Simple multiprogramming with static  software  relocation,

         no protection, one segment per process:

         +   Highest or lowest memory holds OS.

         +   Processes allocated memory starting at 0 (or  N),  up  to

             the OS area.

         +   When a process is initially loaded, link it  so  that  it

             can run in its allocated memory area

         +   Can have several programs in memory at once, each  loaded

             at a different (non overlapping) address.

         +   Advantages:

             +   Allows multiprogramming without swapping processes in


                                  - .2 -


                 and out.

             +   Makes better use of memory

             +   Higher CPU use due to  more  efficient  multiprogram-

                 ming.

         +   Disadvantages

             +   No protection - jobs can read or write others.

             +   External fragmentation

             +   Overhead for variable size memory allocation.

             +   Still limited to size of physical memory.

             +   Hard to increase amount of memory allocation.

             +   Programs are staticly loaded - are tied to fixed  lo-

                 cations  in  memory.  Can't be moved or expanded.  If

                 swapped out, must be swapped to same location.


     +   4. Dynamic memory relocation:  instead of  changing  the  ad-

         dresses  of  a program before it's loaded, change the address

         dynamically during every reference.

         +   Figure of a processor and a memory box, with a memory re-

             location box in between.

         +   There are many types of relocation - to be discussed.

         +   Under dynamic relocation, each program-generated  address

             (called  a  logical  or virtual address) is translated in

             hardware to a physical, or real address.  This happens as

             part of each memory reference.

             +   Virtual (logical) Address is what  the  program  gen-

                 erates.

                 +   Virtual address space is set of  (legal)  virtual


                                  - .3 -


                     addresses the program can generate.

             +   Physical (real) addresses - set of addresses in  phy-

                 sical memory.

                 +   Physical address space of program - set of physi-

                     cal addresses it can get to.

                 +   Physical address space of machine -  set  of  ad-

                     dresses in physical memory.

         +   Dynamic relocation leads to two views of  memory,  called

             address  spaces.    We have the virtual address space and

             the real address space.  Each process has its own virtual

             address space.  With static relocation we force the views

             to coincide. In some systems, there are several levels of

             mapping.


     +   Several types of dynamic relocation.


     +   Base & bounds relocation:

         +   Two  hardware  registers:   base  register  for  process,

             bounds register that indicates the last valid address the

             process may generate.

             +   Real address = base + virtual address

                 +   IF virtual_address < bounds, and VA>= 0

             +   In parallel, the real address is generated by  adding

                 it to the base register.

                 +   This is a form of translation.

                 +   Discuss why comparison is done in parallel.

         +   On each memory reference, the virtual address is compared


                                  - .4 -


         +   Advantages:

             +   Each process appears to  have  a  completely  private

                 memory of size equal to the bounds register plus 1.

             +   Processes are protected from each other.

             +   No address relocation is necessary when a process  is

                 loaded.

             +   Task switching  is  very  cheap,  when  done  between

                 processes in memory- just reload processor registers.

                 +   Higher overhead to load process from disk.

             +   Compaction is possible.

         +   Disadvantages:

             +   Still limited to size of main memory.

             +   External fragmentation (between processes)

             +   Overhead  for  allocating  variable  size  spaces  in

                 memory.

             +   Sharing difficult - only possible if bases  &  bounds

                 overlap.

             +   Only one "segment" - i.e. one region of memory.

             +   New, special hardware needed for relocation.

             +   Time to do relocation (it isn't free).


         +   OS must be able to change value of  relocation  registers

             (why?).

             +   OS loads new process and sets base and bounds  regis-

                 ters.

             +   OS schedules process, and sets base and bounds regis-

                 ter.


                                  - .5 -


             +   When tasks are switched, must be able to  swap  base,

                 bounds and PC registers simultaneously.

             +   These imply that OS must run with base and bounds re-

                 location  turned off - otherwise, would affect itself

                 when running.  (Or would need its own set of base and

                 bounds registers.)

             +   Use of base and bounds controlled by status bit, usu-

                 ally in PSW or SSW, or similar control register.

         +   Users must not be able  to  change  values  of  base  and

             bounds registers

             +   Otherwise, no protection between  users.   Can  trash

                 others or OS.

         +   Problem: how does OS regain control once it has given  it

             up?

             +   OS is entered on trap (including SVC) or interrupt.

             +   When OS is entered, use of base and  bounds  must  be

                 disabled.  (I.e.  bit in PSW is reset.)

             +   Typically, trap handler loads  new  control  register

                 values.

         +   Base & bounds is cheap -- only 2 registers -- and fast --

             the add and compare can be done in parallel.

         +   Examples:  CRAY-1. IBM 7040/7090.


     +   Can consider three types of systems using base and bounds re-

         gisters:

         +   Uniprogramming - single user region.  Bring  a  user  in,

             and run him.


                                  - .6 -


         +   Multiprogramming with Fixed Partitions - (OS/MFT) -  par-

             tition  memory  into  fixed  regions  (may  be  different

             sizes).  User goes into region of given size.

             +   Not very flexible.

             +   IBM OS circa 1965-68

         +   Multiprogramming with Variable Partitions (OS/MVT) - par-

             titions are dynamically variable.

             +   IBM OS circa 1967-72.


     +   Note that we can do any of the three  above  schemes  without

         base and bounds registers - just load programs into region at

         appropriate base address.


     +   Task Switching

         +   We can now switch between processes very cheaply -  don't

             have  to  reload  memory, just change contents of process

             control block (which now has values of  base  and  bounds

             registers).

         +   We can also run processes which are not in memory - how?

             +   Find empty area of memory in which to place process -

                 how?.

                 +   Remove one or  more  processes  from  memory,  if

                     necessary, in order to find space.

                 +   (I.e. copy the  removed  processes  to  space  on

                     disk.)

             +   Copy new process (from disk) into memory.

         +   If only one process fits in memory, have to wait for swap


                                  - .7 -


             to take place.

             +   If several processes fit  in  memory,  can  swap  one

                 while executing the other.


     +   5. Multiple segments - Segmentation.

         +   Divide virtual address space into several "segments".

             +   This is not the same as the "segments" of linkers and

                 loaders.

         +   Use a separate base and bound for each segment, and  also

             add  protection  bits  (read,  write, execute), and valid

             bit.  (Also will want dirty bit.)

     +   Each address is now <segment #, byte in segment>

         +   Each memory reference indicates a segment and  offset  in

             one or more of three ways:

             +   Top bits of address  select  segment,  low  bits  the

                 offset.  This is the most common, and the best.

             +   Or, segment is selected implicitly by the instruction

                 (e.g.  code  vs. data, stack vs. data, which base re-

                 gister is used, or 8086 prefixes).

             +   Instruction specifies directly or indirectly  a  base

                 register for the segment.

             +   Subprograms  (procedures,  functions,  etc.)  can  be

                 separate segments.


     +   Segments typically are associated with logical partitions  of

         your process address space - e.g. code, data stack.  Or, each

         module or procedure can be a separate segment.


                                  - .8 -


         +   Need either segment table or segment  registers  to  hold

             the base and bounds for each segment.

             +   Draw picture of segment table, with segment table en-

                 tries.

         +   Memory mapping procedure consists of table lookup + add +

             compare.

         +   Example:  PDP-10 with high and low segments  selected  by

             high-order address bit.


     +   Address translation for segmentation

         +   Have segment table - maps segment number to [Segment base

             address,  segment  length (limit), protection bits, valid

             bit, reference bit, dirty bit]

             +   This info is in Segment Table Entry (STE)

             +   Diagram of segment table.

             +   Segment descriptor

         +   Need some hardware to automatically map virtual  (segment

             number, word number) to real address.

     +   Real address = segment_table(segment #) + word number.

         +   Invalid if word_number > limit.  (Note that  we  do  test

             without adding bound to both)

             +   Also valid bit must be on, and permission  bits  must

                 permit access.

         +   Need more hardware to make it go fast (discuss later)

         +   Have Segment Table Base Register (STBR) point to base  of

             segment table (for hardware to use)

     +   Alternate approach - if there are a small number of segments,


                                  - .9 -


         can have segment registers - one register per segment.

     +   Can also multiplex a small number of segment registers  among

         a large number of segments (as with X86 architecture)


     +   Advantages

         +   Each process has own virtual address space

         +   Protection between address spaces

         +   Separate protection between segments (R/W/E)

         +   Virtual space can be larger than physical memory

         +   Unused segments don't need to be loaded Can load segments

             as needed.

             +   Attempt to reference missing segment  called  segment

                 fault.

                 +   Discuss segment faults later.

         +   Can share one or more segments

             +   sharing is tricky - we'll talk about this later

         +   Segments can be placed anywhere in memory that they fit.

         +   Memory compaction easy.

         +   Segment sizes can be changed independently.


     +   Disadvantages

         +   Each segment must be allocated contiguously.

         +   Segment size < memory size

         +   External fragmentation

         +   Overhead of allocating memory

         +   Need hardware for address translation

         +   Overhead (time/hardware) of doing address translation


                                  - .10 -


         +   More complicated.

         +   Space for segment table.


     +   Note that segment tables are usually 1-1 with  processes.   A

         segment table defines a process's address space.

         +   What would happen if all  processes  shared  the  segment

             table?

             +   Protection is a problem

             +   Have same problem as before - now we have to allocate

                 shared virtual instead of shared physical memory.

         +   When we switch processes, we  reload  the  STBR  (segment

             table base register), which changes address space.


     +   Processes vs. Threads

         +   A process is a single flow of control associated  one  to

             one with an address space.

         +   A thread is a single  flow  of  control.   There  may  be

             several threads within an address space.

             +   Threads are considered lightweight, because the over-

                 head  of  creating a thread is usually much less than

                 that  to  create  a  process.   Cost  to  communicate

                 between  threads  in  same address space is very low.

                 Cost to communicate between different address  spaces

                 is high (e.g. pipe, file).

             +   Threads in one address space  share  code  and  data.

                 Threads do not usually share stack - usually each has

                 its own.  Can synchronize using P and V  without  in-


                                  - .11 -


                 volving the operating system.

             +   Processes do not normally share, so P&V must  use  OS

                 as intermediary.

             +   To use threads, usually have  constructs  like  fork,

                 join, signal, wait, broadcast.


     +   Managing segments:

         +   Keep copy of segment table in process control  block  (or

             if block is too small, associated with it).

         +   When  creating  process,  define  segments   in   segment

             table/PCB.

         +   When process is assigned memory, figure  out  where  each

             segment goes, and put base and bounds into segment table.

         +   Need memory map, which maps memory to segments.  (Segment

             table maps segments to memory.)  Also called core map

         +   When switching contexts, save segment table or pointer to

             it  in  old  process's  PCB, reload it from new process's

             PCB.

         +   When process dies, return segments to free pool.

         +   When there's no space to allocate a new segment:

             +   Compact memory (move all segments, update  bases)  to

                 get all free space together.

             +   Or, swap one or more segments to disks to make  space

                 (must  then  check during context switching and bring

                 segments back in before letting process run).

         +   To enlarge segment:


                                  - .12 -


             +   See if space above segment is free.  If so, just  up-

                 date the bound and use that space.

             +   Or, move the segment above this one to disk, in order

                 to make the memory free.

             +   Or, move this segment to disk and bring it back  into

                 a  larger  hole  (or,  maybe just copy it to a larger

                 hole).

             +   Or, move it down, if there is space below.


     +   Can load segments only when needed.

         +   Segment Fault - an attempt to reference a  segment  which

             is not present.

             +   Trap to OS

             +   Find space for segment  -  replace  another  one,  if

                 necessary

             +   Load Segment (remove other segments to make space, if

                 necessary)

             +   Set valid bit==1, and update other entries in STE.

             +   Make process ready.


     +   Paging:  goal is to make allocation and swapping easier,  and

         to reduce memory fragmentation.

         +   Make all chunks of virtual memory  the  same  size,  call

             them pages.  Typical sizes range from 512-16K bytes.

         +   Divide real memory into page frames, which are  the  same

             size as pages.


                                  - .13 -


             +   I will frequently be sloppy and  say  "page"  when  I

                 mean "page frame".

         +   Virtual Address typically now consists of N bits,  parti-

             tioned as K (page number) and N-K (byte within page).

         +   For each process, a page table defines the  base  address

             of  each  of  that process' pages.  Each page table entry

             contains bits for the real address of the  page,  protec-

             tion, valid, reference, and dirty bits.

             +   Diagram of page table - see figure

             +   Page table base  register  points  to  base  of  page

                 table.

         +   Translation process:  page number always  comes  directly

             from  the  (virtual) address.  Since page size is a power

             of two, no comparison or addition is necessary.  Just  do

             table lookup and bit substitution.

             +   Diagram of translation process

             +   No limit field is needed or used.  (just overflow  to

                 next page)

             +   We will need a table (page map or core map) or memory

                 map  telling  us who owns which page frame in memory.

                 Points back to any page table  that  points  to  this

                 page.

         +   Not all of a process' memory has to be loaded  into  real

             memory.   If  one attempts to reference a location not in

             memory, it is prevented by a page fault - this  condition

             is detected by the valid bit.

             +   Same as before with segment fault.


                                  - .14 -


             +   Page fault - trap condition.   Detected  by  hardware

                 when valid bit is off.

                 +   Trap to OS (trap, not interrupt)

                 +   OS finds page frame, (somehow - discussed later)

                 +   gets page,  (reads from disk)

                 +   updates page table,

                 +   make process ready.


     +   Pages and Paging are used to produce a physical  partitioning

         of the process address space and memory.  There usually isn't

         any relation between page boundaries and what is in a page.


     +   Advantages

         +   Easy to allocate:  keep a free  list  of  available  page

             frames and grab the first one.

         +   No external fragmentation.

         +   When combined with segmentation (discussed  later):  Non-

             contiguous allocation of segments.

         +   Permits process to have virtual space  much  larger  than

             physical space.

         +   Permits pages to be loaded as/when needed.


     +   Disadvantages

         +   Internal fragmentation:  page size doesn't match up  with

             information  size.   The  larger the page, the worse this

             is.


                                  - .15 -


         +   Hardware for address translation.

         +   Time for address translation.

         +   Page faults may cause considerable overhead.

             +   What happens when  we  have  a  page  fault  (missing

                 page)? - to be discussed later.

         +   Need for page replacement algorithm.

             +   We need algorithms to decide when to move pages  into

                 and out of memory.  (discussed later).

         +   Table space:  if pages are small, the table  space  could

             be substantial.  In fact, this is a problem even for nor-

             mal page sizes:  consider a 32-bit addresss space with 1k

             pages.   What  if  the  whole  table has to be present at

             once?

             +   1. Partial solution:  keep base and bounds  for  page

                 table,  so  only  large  processes have to have large

                 tables.

             +   2. Usual solution: make page table two  level.   (see

                 figure  6)  Map  high order bits through first table,

                 and lower order page number bits through 2'nd table.

                 +   First level table can be called  page  directory,

                     or segment table (confusing usage).  Second level

                     table usually called page table.

             +   3. Put user page tables in OS virtual memory  -  then

                 unneeded pages are not allocated.

                 +   Note that this yields a 2 level page table -  ad-

                     dress  is  mapped  through OS page table and then

                     user page table.


                                  - .16 -


             +   4. Make page table a Hash Table (done by IBM and HP)

                 +   Called inverted page table

         +   Efficiency of access:  even small page  tables  are  gen-

             erally  too large to load into fast memory in the reloca-

             tion box.  Instead, page tables are kept in  main  memory

             and the relocation box only has the page table's base ad-

             dress.  It thus takes one overhead  reference  for  every

             real  memory  reference.  If page table is two level, re-

             quires two extra references.


     +   Where are the page tables?

         +   Page tables are either referred to with  real  addresses,

             or OS virtual addresses.

             +   Cannot be put where users can get to them.

                 +   Otherwise, users could change values, which would

                     bypass protection.

         +   Page table entries are usually real addresses  (including

             addresses of first level page table, and PTBR.)

             +   Could have OS virtual  addresses  in  entries,  which

                 means that another level of translation is needed.

         +   Is the OS paged?

             +   Yes - advantages for users also apply to OS.

         +   Can page tables be paged out?

             +   Sure - why not?

             +   But if page tables are in OS's  virtual  memory,  and

                 page  tables  have OS address space virtual addresses


                                  - .17 -


                 in them, then translation  of  user  virtual  address

                 also requires OS virtual addr. translation.

                 +   This might require a recursive page fault.

                 +   Means that OS page tables must be in real  memory

                     and use real addresses.

             +   Alternative is to put page tables  in  "real  memory"

                 and use real addresses.  (I.e. have V=R).


     +   What can't be paged out?

         +   This is called ``wired down''.

         +   The code that brings in pages.

         +   Pages for critical parts of the operating system.   (Han-

             dling a page fault takes time.)

             +   Some interrupt and trap handlers, including code that

                 starts up a process.

             +   OS page tables

         +   Sensitive real time routines

         +   Pages currently undergoing I/O. (i.e. I/O buffers)


     +   Note how effective paging is for protection -  you  can  only

         reference parts of memory which appear in your page table(s).

         The only parts that appear are those that you have access to.


     +   Paging and segmentation combined

         +   Diagram of segment table/ page table mapping.  In segment

             table  entry, put protection bits (read, write, execute),


                                  - .18 -


             valid bit.

         +   Each segment broken into one or more pages.

         +   Segments correspond to logical units:  code, data, stack.

             Segments  vary  in  size and are often large.  Protection

             can be associated with segments.

         +   Pages are for the use of the OS;  they are fixed-size  to

             make it easy to manage memory.

         +   Going from paging to P+S is like going from  single  seg-

             ment to multiple segments, except at a higher level.  In-

             stead of having a  single  page  table,  have  many  page

             tables  with  a  base and bound for each.  Call the stuff

             associated with each page table a segment.


     +   Advantages:

         +   Provides 2 level mapping (as did page directory and  page

             table).  Makes page table size manageable.

         +   Provides both physical unit of management (page) and log-

             ical unit of management (segment).

         +   Effectively produces two dimensional addressing [segment,

             address within segment].

         +   Can grow and shrink segments  individually,  and  without

             interfering  with  other segments.  Just add pages (which

             can be anywhere in memory.)

         +   Segmentation with no compaction or fragmentation problem.

         +   Bounds checks on segments handled by having page  not  be

             valid.  (quantized to page size).

         +   No page table for segment which doesn't exist.


                                  - .19 -


         +   Can share segment and/or page.

         +   Protection at level of page and/or segment.

     +   Disadvantages

         +   More complicated than either segmentation or paging.

         +   Overhead of 2 level mapping (time and hardware).

         +   Overhead of both schemes.

         +   Usual internal fragmentation problem, but if page size is

             small compared to most segments, then internal fragmenta-

             tion is not too bad.


     +   Paging vs. Segmentation -

         +   page is fixed size,

             +   physical unit of information,

             +   used only for memory management;

             +   not visible to programmer.

         +   Segment is logical unit (usually)

             +   visible to user,

             +   of arbitrary size.


     +   Note that user may see  (be  aware  of)  segmentation.   User

         should not be aware of paging.


     +   Can share at two levels:   single  page,  or  single  segment

         (whole  page table). - Diagram of shared pages or shared seg-

         ments (shared page table).

         +   Does shared region have to be at  same  address  in  each

             process?


                                  - .20 -


             +   No - as long as it can be found.

         +   Can shared region contain any  absolute  addresses  (i.e.

             virtual addr)?

             +   Usually not - very dangerous - addresses may  not  be

                 the same in each process.

             +   But can contain relative addresses - eg.  offsets  to

                 certain registers or segment base. Such registers can

                 be loaded by each process differently.

             +   If entire segment is shared, and addresses are  rela-

                 tive to start of segment, we are okay.


     +   Copy on write.

         +   Share pages, but with 2 separate page tables.  Both  page

             tables point to same pages.

         +   Pages are made read only.  On attempt to write, a copy is

             made.


     +   Problem:  how does the operating system get information  from

         user  memory?  E.g. I/O buffers, parameter blocks.  Note that

         the user passes the OS a virtual address.

         +   1. Use real addresses - In some cases the  OS  just  runs

             unmapped.   Then  all it has to do is read the tables and

             translate user addresses in software.

             +   Note: addresses that are contiguous  in  the  virtual

                 address space may not be contiguous physically.  Thus

                 I/O operations may have to be split up into  multiple


                                  - .21 -


                 blocks.  Draw an example.

         +   2. Can specify (somehow) that the data addresses  are  to

             use the User Page Tables.  (would need special hardware)

             +   Note that we therefore need two active PTBRs  -  user

                 PTBR and System PTBR.

         +   3. Have OS page tables point to user pages.

         +   4. A few machines, most notably the VAX, make both system

             information  and  user  information  visible at once (but

             can't touch system stuff unless running with special ker-

             nel  protection  bit  set).  This makes life easy for the

             kernel, although it doesn't solve the I/O problem.

             +   I.e. OS is in everyone's address space.


     +   VAX Addressing


     +   Another example:  VAX.

         +   Address is 32 bits, top two select segment.   Four  base-

             bound pairs define page tables (system, P0, P1, unused).

         +   Pages are 512 bytes long.

         +   Read-write protection information  is  contained  in  the

             page table entries, not in the segment table.

         +   One segment contains operating system stuff, two  contain

             stuff of current user process.

         +   Potential problem:  page tables can get big.  Don't  want

             to  have  to  allocate  them contiguously, especially for

             large user processes.  Solution is to use the system page

             table to map the user page tables so the user page tables


                                  - .22 -


             can be scattered:

             +   System base-bounds pairs are physical addresses, sys-

                 tem tables must be contiguous.

             +   User base-bounds pairs are virtual addresses  in  the

                 system space.  This allows the user page tables to be

                 scattered in non-contiguous pages of physical memory.

             +   The result is a two-level scheme.

             +   This is alternative to normal two level  scheme.   If

                 normal two level scheme were used, and if page tables

                 were paged, would actually be four level scheme.


     +   Inverted Page Table

         +   Idea is that page table is organized as hash table.  Hash

             from  virtual  address  into table with number of entries

             larger than physical memory size.  (Page table shared  by

             all processes.)


     +   Problem with segmentation and paging:   extra  memory  refer-

         ences  to access translation tables can slow programs down by

         a factor of two or three.    There  are  obviously  too  many

         translations  required  to keep them all in special processor

         registers.

         +   But for small machines (e.g. PDP-11), can have one regis-

             ter  for  every  page  in  memory, since can only address

             64Kbytes.


                                  - .23 -


     +   Solution:  Translation Lookaside Buffer (TLB), also called

         +   Translation Buffer (TB) (DEC), or

         +   Directory Lookaside Table (DLAT) (IBM), or

         +   Address Translation Cache (ATC) (Motorola).

         +   A TLB is used to store a few of the translation table en-

             tries.  It's very fast, but only remembers a small number

             of entries.  On each memory reference: (draw picture, ex-

             plain name)

             +   First ask TLB if it knows about the page.  If so, the

                 reference proceeds fast.

             +   If TLB has no  info  for  page,  translator  must  go

                 through  page and segment tables to get info.  Refer-

                 ence takes a long time, but give the  info  for  this

                 page  to  TLB  so  it will know it for next reference

                 (TLB must forget one of its current entries in  order

                 to record new one).


     +   TLB Organization: Picture of black box.  Virtual page  number

         goes  in,  physical  page  location  comes out.  Similar to a

         cache.


     +   So what the TLB does is:

         +   Accept virtual address

         +   See if virtual address matches entry in TLB

         +   If so, return real address

         +   If not, ask translator to provide real address.

         +   Translator loads new translation into TLB, replacing  old


                                  - .24 -


             one.  (Usually one not used recently.)

             +   (Must replace entry in same set.)


     +   Will the TLB work well if it holds only a  few  entries,  and

         the program is very big?

         +   Yes - due to Principle of Locality. (Peter Denning)

     +   Principle of Locality

         +   1. Temporal Locality - Information that has been used re-

             cently is likely to be continued to be used.

             +   Alternate formulation - information in use  now  con-

                 sists  mostly of the same information as was used re-

                 cently.

         +   2. Spatial Locality - info  near  the  current  locus  of

             reference is also likely to be used in the near future.

         +   Example - top of desk is cache for file cabinet.  If desk

             is messy, stuff on top is likely to be what you need.

         +   Explanation- code is either sequential  or  loops.   Data

             used  together  is  often  clustered together (array ele-

             ments, stack, etc.)

         +   In practice, TLBs work quite well.  Typically find 96% to

             99.9% of the translations in the TLB.


     +   TLB is just a memory with some comparators.  Typical size  of

         memory:   16-512  entries.   Each  entry holds a virtual page

         number and the corresponding physical page number.   How  can

         memory be organized to find an entry quickly?

         +   One possibility:  search  whole  table  associatively  on


                                  - .25 -


             every  reference.   Hard to do for more than 32 or 64 en-

             tries.

         +   A better possibility:  restrict the info  for  any  given

             virtual  page  to fall in into a subset of entries in the

             TLB.  Then only need to search that Set.  Called set  as-

             sociative.   E.g.  use  the low-order bits of the virtual

             page number as the index to select the  set.   Real  TLBs

             are  either fully associative or set associative.  If the

             size of the set is one, called direct mapped.

             +   Diagram of set associative TLB.

                 +   Replacement must be in same set.


     +   Translator is a piece of hardware that knows how to translate

         virtual to real addresses.  It uses the PTBR to find the page

         table(s).  Reads the page table to find the page.


     +   TLBs are a lot like hash tables except simpler (must be to be

         implemented  in  hardware).   Some  hash functions are better

         than others.

         +   Is it better to use low page number bits than  high  ones

             to select the set?

             +   Low ones are best:  if a large  contiguous  chunk  of

                 memory  is  being  used,  all pages will fall in dif-

                 ferent sets.


     +   Must be careful to flush TB during each context swap.  Why?

         +   Otherwise, when we switch processes, we'll still be using


                                  - .26 -


             the  old  translations  from virtual to real, and will be

             addressing the wrong part of memory.

         +   Alternative - can make process identifier (PID)  part  of

             virtual  address.   Have  a  Process  Identifier Register

             (PIDR) which supplies that part of the address.


     +   When we modify the page table, we must either  flush  TLB  or

         flush the entry that was modified.


                                  - .27 -


         Topic: Demand Paging, Thrashing, Working Sets


     +   So far we have disentangled the programmer's view  of  memory

         from  the system's view using a mapping mechanism.  Each sees

         a different organization.  This makes it easier for the OS to

         shuffle  users  around  and simplifies memory sharing between

         users.


     +   However, until now a user process had to be completely loaded

         into memory before it could run.  (sort of- we mentioned page

         faults and segment faults, but...) This is wasteful  since  a

         process  may  only need a small amount of its total memory at

         any one time (locality).  Virtual memory permits a process to

         run  with  only some of its virtual address space loaded into

         physical memory.


     +   Virtual address  space,  translated  to  either  a)  physical

         memory  (small,  fast)  or  b) disk (backing store), which is

         large but slow.

         +   Backing storage is typically disk.


     +   The idea is to produce the illusion that the  entire  virtual

         address space is in main memory, when in fact, it isn't.

     +   More generally, we have a multi-level (2 level in this  case)

         memory hierarchy.  We want to have the cost of the slower and

         larger level, and the performance of the smaller  and  faster

         level.


                                  - .28 -


     +   Diagram of a memory hierarchy, showing access times.


     +   The reason that this works is that most programs  spend  most

         of their time in only a small piece of the code.


     +   Principle of Locality - there are two parts.  .

         +   Temporal Locality - the same information is likely to  be

             reused.

         +   Spatial Locality - nearby information is also  likely  to

             be used in the near future.

         +   (Idea invented (?) by Peter Denning.)


     +   If not all of process is loaded when it is running, what hap-

         pens  when  it  references a byte that is only in the backing

         store?  Hardware and software cooperate to make  things  work

         anyway.

         +   First,  extend  the  page  tables  with  an   extra   bit

             ``present,  or  valid''.   If  present  isn't  set then a

             reference to the page results in a trap.   This  trap  is

             given a special name, page fault.

             +   Page fault - an attempt to reference a page which  is

                 not in memory.

         +   Diagram of Page Table Entry.  (show real address, protec-

             tion bits, valid/present bit, dirty bit, reference bit).

         +   Any  page  not  in  main  memory  right   now   has   the

             ``present/valid'' bit cleared in its page table entry.


                                  - .29 -


     +   When page fault occurs:

         +   Trap to OS (why?)

         +   Verify that reference is to valid page; if not, abend.

         +   Find page frame to put page.

             +   Find a page to replace, if no empty frame.

             +   If dirty, find a place to put replaced page on secon-

                 dary storage (Can reuse previous location.)

             +   Remove page (either copy back or overwrite)

             +   Update page table.

             +   Update map of secondary storage if necessary (to show

                 where we put page)

             +   Update memory (core) map

             +   Flush TLB entry for page that has been removed.

         +   Operating system brings page into memory

             +   Find page on secondary storage.

             +   Transfer it.

             +   Update page table (set valid bit, and real address)

             +   Update map of file system/disk to show that  page  is

                 now in memory.  (e.g. update cache of inodes)

             +   Update Core Map (memory map).

         +   The process resumes execution.  (i.e. it  goes  on  ready

             list.  maybe it resumes)

     +   Note that all of these take time.  We may switch  to  another

         process while the IO is taking place.


     +   Multiprogramming is supposed to overlap the fetch of  a  page


                                  - .30 -


         (or I/O) for one process with the execution of another.

         +   If no process is available to run (all doing I/O or  page

             fault), called multiprogramming idle or page fetch idle.


     +   Page out - to remove a page.

     +   Page out a process - remove it from memory.

     +   Page in a process - load its pages into memory.


     +   Continuing (resuming) the process is very tricky, since  page

         fault  may  have  occurred  in  the middle of an instruction.

         Don't want user process to be aware that the page fault  even

         happened.

         +   Can the instruction just be skipped?

         +   Suppose the instruction is restarted from the beginning?

             +   How is the ``beginning'' located?

             +   Even if the beginning is found, what  about  instruc-

                 tions with side effects, like MOV (SP)+, 10?

         +   Without additional information from the hardware, it  may

             be  impossible  to  restart a process after a page fault.

             Machines that permit restarting must have  hardware  sup-

             port  to  keep track of all the side effects so that they

             can be undone before restarting.

         +   Early Apollo approach for 68000

             +   (two processors, one just for handling page faults)

             +   IBM 370 solution (execute long instructions twice)

         +   If you think about this when  designing  the  instruction

             set,  it isn't too hard to make a machine support virtual


                                  - .31 -


             memory.  It's much harder to do after the fact.

         +   How many page faults can occur in one instruction??

             +   E.g. instruction spans page boundaries, and  each  of

                 two  operands  spans  two  pages.  Could have 2 level

                 page table, with one page of  page  table  needed  to

                 point to each instruction & data page.


     +   Once the hardware has provided basic capabilities for virtual

         memory, the OS must implement 3 algorithms:

         +   Page fetch algorithm:  when to bring pages into memory.

         +   Page replacement  algorithm:   which  page(s)  should  be

             thrown out, and when.

         +   Page placement  algorithm:  where  to  put  the  page  in

             memory.


     +   Note that the page placement algorithm for main memory is ir-

         relevant  -  memory  is  uniform.   (But CRAY has non-uniform

         memory access time.  Also not irrelevant for other  parts  of

         memory hierarchy.)


     +   Page Fetch Algorithms:

         +   Demand paging:  start up process with  no  pages  loaded,

             load  a  page  when a page fault for it occurs, i.e. wait

             until it absolutely MUST be in memory.  Almost all paging

             systems are like this.

         +   Request paging:  let user say  which  pages  are  needed.

             What's wrong with this?


                                  - .32 -


             +   Users don't always know best, and aren't  always  im-

                 partial.   They  will overestimate needs.  Maybe men-

                 tion overlays here, although overlays are  even  more

                 draconian than request paging.

             +   Still  need  demand  paging,  in  case  user  doesn't

                 remember to bring in the right page.


         +   Prefetching, or Prepaging: bring a page into  memory  be-

             fore  it is referenced (e.g. when one page is referenced,

             bring in the next one, just in case).

             +   Reason for prepaging is

                 +   (a) bring in several pages at once - cut per page

                     overhead

                 +   (b) eliminate real time delay in waiting for page

                     - overlap computation and fetch.

             +   Idea is to guess at which page will be needed.   Hard

                 to  do effectively without a prophet, may spend a lot

                 of time doing wasted work.  If used at all, typically

                 One block lookahead - i.e.  the next one.

             +   Seldom works.

             +   Can also do "swapping", ("working  set  restoration")

                 whereby when you start a process, you swap in most or

                 all of its pages, or at least all of the pages it was

                 using  the  last time it was running.  When it stops,

                 you swap out its  pages  in  a  bunch  on  contiguous

                 tracks on disk.

                 +   Also called working set restoration


                                  - .33 -


         +   Overlays - a technique by which the user divides his pro-

             gram into segments.  The user issues commands to load and

             unload the segments from memory; these  commands  specify

             the  location  in  memory  where the segments are placed.

             Used when there is no virtual memory,  and  the  user  is

             given a partition of real memory to work with.


     +   Page Replacement Algorithms:

         +   Random (RAND):  pick any page at random.

         +   FIFO:  throw out the page that has  been  in  memory  the

             longest.   The  ideas  are:  (a)  its simple, and (b) the

             first page that was fetched is believed to be  no  longer

             needed.

         +   LRU (least recently used):  use the past to  predict  the

             future.   Throw out the page that hasn't been used in the

             longest time.  If there is locality, then this is presum-

             ably the best you can do.

         +   MIN (or OPT):  as always, the best algorithm arises if we

             can predict the future.

             +   Throw out the page that won't be used for the longest

                 time into the future.  This requires a prophet, so it

                 isn't practical, but it is good for comparison.


     +   Real and Virtual Time

         +   Virtual Time is time as measured by a running  process  -

             doesn't  include  time  that process is blocked (e.g. for

             page fault or other reason).  Often in  units  of  memory


                                  - .34 -


             references.

         +   Real Time - time as measured  by  wall  clock.   Includes

             time that process is blocked (including page faults).


     +   How to evaluate paging algorithms:

         +   What are the costs of page faults?

             +   CPU overhead for page fault- handler, dispatcher, I/O

                 routines.  (e.g. 3000 instructions).

             +   Possible CPU (multiprogramming) idle while  page  ar-

                 rives

             +   I/O busy while page is transferred

             +   Main memory (or cache)  interference  while  page  is

                 transferred.

             +   Real time delay to handle page fault.


         +   Two approaches (Metrics) for Eval Paging Algorithms:

             +   Curve of page faults  vs.  amount  of  space  used  -

                 preferable.  (Called "parachor curve")

             +   Space time product vs.  amount  of  space.   Want  to

                 minimize STP.  (show curve)

                 +   Space time product (STP)- integral of  amount  of

                     space used by program over the time it runs.  In-

                     cludes time for page faults.  This  is  the  real

                     space time product.

                 +   Exact formula is integral(0,E(space)) [m(t)  dt],

                     where  E  is  ending time of program, and m(t) is

                     memory used by program at time t (real time).


                                  - .35 -


                 +   In       discrete        time,        {sum(0,R,i)

                     [m(i)(1+f(i)*PFT)]},  where  R  is ending time of

                     program in discrete time (i.e. number  of  memory

                     references),  i is i'th memory reference, m(i) is

                     number of pages in memory at i'th reference, f(i)

                     is  indicator  function = 0 if no page fault, =1)

                     if page fault.  PFT = page fault time.

                     +   First product is virtual space-time  product.

                         Second term adds in time for page faults.

                 +   Space time product can be computed  approximately

                     from page fault vs. space curve.  (approximately)

                     STP = (virtual running time for program(F) +  pft

                     *  number  of page faults) * (mean space occupied

                     by program (n bar)).

                     pft is time for page fault to be handled.

                 +   Space time product depends on PFT, so is technol-

                     ogy  dependent.   Also  doesn't take into account

                     fact that machine may not be idle  when  page  is

                     being fetched.


     +   Example: Try the reference string 4, 3, 2, 1, 4, 3, 5, 4,  3,

         2,  1, 5.  assume there are three or four page frames of phy-

         sical memory. Show the memory  allocation  state  after  each

         memory reference.

         +   Do for MIN, LRU, FIFO

         +   see figures.

         +   Note the anomally for FIFO - we would like the miss ratio


                                  - .36 -


             to decline with increasing memory size.


     +   Stack Algorithm - An algorithm which obeys the inclusion pro-

         perty  -  the set of pages in a memory of size N at time t is

         always a subset of the set of pages in a memory of  size  N+1

         at  time t.  Obviously cannot have miss ratio increasing with

         memory size.

         +   Stack is list of pages in order of size of  memory  which

             includes them.


     +   Implementing LRU:  need some form of hardware support in ord-

         er to keep track of which pages have been used recently.

         +   Perfect LRU?  Keep a register for each  page,  and  store

             the system clock into that register on each memory refer-

             ence.  To replace a page, scan through  all  of  them  to

             find the one with the oldest clock.  This is expensive if

             there are a lot of memory pages.

         +   Or, could use linked list to maintain "LRU stack".   Note

             that we can see (by inspection) that with LRU, miss ratio

             with never increase with increasing number  of  pages  in

             memory.

         +   In  practice,  almost  nobody  implements  perfect   LRU.

             (CDC-Star  did).  Instead, we settle for an approximation

             that is efficient.  Just find an old page, not necessari-

             ly the oldest.

             +   LRU is just an approximation anyway so  why  not  ap-


                                  - .37 -


                 proximate a little more?


     +   use bit (reference bit) - a bit in the page table entry (usu-

         ally  cached in the TLB), that is set when the page is refer-

         enced.  It is turned off under OS control.


     +   Clock algorithm:  keep  ``use''  bit  for  each  page  frame,

         hardware sets the bit for the referenced page on every memory

         reference.  Have a pointer pointing to the k'th  page  frame.

         When  a  fault  occurs, look at the use bit of the page being

         pointed to.  If it is on, turn it off, increment the pointer,

         and  repeat.   If  it  is  off, replace the page in that page

         frame, set use(k)=1.  (Clock diagram.)

         +   Also called FINUFO - first in, not used, first out.


     +   In effect, the use bit, when used with  the  clock  algorithm

         breaks  the  pages  into two groups: those "in use" and those

         "not in use".  We want to replace one of the latter.


     +   What does it mean if the clock hand is sweeping very slowly?


     +   What does it mean if the clock hand is sweeping very fast?


     +   Some systems also use a ``dirty''  bit  to  give  some  extra

         preference to dirty pages.  This is because it is more expen-

         sive to throw out dirty pages:  clean ones need not be  writ-

         ten to disk.


                                  - .38 -


         +   What are tradeoffs here?

             +   Cost of page fault declines -  lower  probability  of

                 writing out dirty block.

             +   Probability of fault increases - i.e. if clock was  a

                 good  algorithm,  and we mess with it, it should make

                 it worse.


     +   How would Least Frequently Used replacement work?

         +   It would be a disaster, since locality changes.


     +   A per process replacement algorithm or local page replacement

         algorithm,  or  per  job replacement algorithm allocates page

         frames to individual processes:  a page fault in one  process

         can  only replace one of that process' frames.  This relieves

         interference from other processes.

     +   If all pages from all processes are lumped  together  by  the

         replacement  algorithm,  then  it  is said to be a global re-

         placement algorithm.  Under this scheme,  each  process  com-

         petes with all of the other processes for page frames.

         +   If you are using a local replacement algorithm, you  have

             partitioned memory among the jobs or processes.

         +   Local algorithm:

             +   Protects jobs from others which are badly behaved.

             +   Hard to decide how much space  to  allocate  to  each

                 process.

             +   Allocation may be unreasonable.

         +   Global algorithm:


                                  - .39 -


             +   Permits memory allocation for process to  shift  over

                 time.

             +   Permits memory allocation to adapt to process needs

             +   Permits  badly  behaved  process  to  grab  too  much

                 memory.


     +   Thrashing:  A situation when the page fault rate is  so  high

         that  the  system spends most of its time either processing a

         page fault or waiting for a page to arrive.

         +   Thrashing means that there is too much page fetch idle  -

             time when the processor is idle waiting for a page to ar-

             rive.

         +   Suppose there are many users, and that between them their

             processes are making frequent references to 50 pages, but

             memory has 40 pages.

         +   Each time one page is brought  in,  another  page,  whose

             contents will soon be referenced, is thrown out.

             +   Compute average memory access time.

         +   The system will spend all of its time reading and writing

             pages.  It will be working very hard but not getting any-

             thing done.

             +   The progress of the programs will make it  look  like

                 the  access time of memory is as slow as disk, rather

                 than disks being as fast as memory.

     +   Plot of CPU utilization vs. level of multiprogramming.

         +   Thrashing was a severe problem  in  early  demand  paging

             systems.


                                  - .40 -


     +   Thrashing occurs because the system doesn't know when it  has

         taken  on more work than it can handle.  LRU mechanisms order

         pages in terms  of  last  access,  but  don't  give  absolute

         numbers indicating pages that mustn't be thrown out.

         +   What do  humans  do  when  thrashing?   If  flunking  all

             courses at midterm time, drop one.


     +   Solutions to Thrashing:

         +   If a single process is too large  for  memory,  there  is

             nothing  the OS can do.  That process will simply thrash.

             (Buy more memory)

         +   If the problem arises  because  of  the  sum  of  several

             processes:

             +   Figure  out  how  much  memory  each  process  needs.

                 Change  scheduling  priorities  to  run  processes in

                 groups whose memory needs can be satisfied.

             +   Shed load.

             +   Change paging algorithm


     +   Working Sets are a solution proposed by  Peter  Denning.   An

         informal definition is

         +   Working set = ``the set of pages that a process is  work-

             ing  with, and which must thus be resident if the process

             is to avoid thrashing.''

             +   The idea is to use the recent needs of a  process  to

                 predict its future needs.

         +   Formally,  ``Exactly  that  set  of  pages  used  in  the


                                  - .41 -


             preceeding T virtual time units'' (T usually given in un-

             its of memory references.)

         +   Choose T, the working set parameter.  At any given  time,

             all  pages  referenced by a process in its last T seconds

             of execution are considered to comprise its working set.

         +   Working Set Paging  Algorithm  keeps  in  memory  exactly

             those pages used in the preceding T time units.

             +   Minimum values for T are  about  10,000  to  100,000

                 memory references.

         +   A process will never be executed unless its  working  set

             is  resident  in  main memory.  Pages outside the working

             set may be discarded at any time.

             +   Note that this requires  a  reservoir  of  unassigned

                 page frames.


     +   Working set paging requires that the sum of the sizes of  the

         working  sets of the jobs eligible to run (which we will call

         the balance set) be less than or equal to the amount of space

         available.   We previously referred to the balance set as the

         jobs in the in-memory queue.

         +   Some algorithm must be provided for moving processes into

             and  out of the balance set.  What happens if the balance

             set changes too frequently?

             +   Still get thrashing

     +   As working sets change, corresponding changes will have to be

         made in the balance set.


                                  - .42 -


     +   Working set also has the advantage over LRU that  it  adjusts

         the  amount  of  space  in  use according to what the process

         needs.  LRU works with a fixed amount of space, even though a

         process' needs change.


     +   How do we implement working set?  Can it be done exactly?

         +   One of the initial plans was to  store  some  sort  of  a

             capacitor  with each memory page.  The capacitor would be

             charged on each reference, then would discharge slowly if

             the  page  wasn't referenced.  Tau would be determined by

             the size of the capacitor.  This wasn't  actually  imple-

             mented.   One  problem  is  that we want separate working

             sets for each process, so the capacitor  should  only  be

             allowed to discharge when a particular process executes.

             +   What if a page is shared?

         +   Actual solution:  take advantage of use bits.

             +   OS maintains idle time value for each  page:   amount

                 of  CPU time received by process since last access to

                 page.

             +   Every once in a while, scan all pages of  a  process.

                 For each use bit on, clear page's idle time.  For use

                 bit off, add process' CPU time (since last  scan)  to

                 idle time.  Turn all use bits off during scan.

             +   Scans happen on order of every few seconds (in  Unix,

                 I is on the order of a minute or more).

         +   What is overhead of sampling reference bits regularly?

             +   Assume  samples  every  10,000   memory   references.


                                  - .43 -


                 40Mbyte  memory,  with  4K  pages.  5 instructions to

                 sample one bit, with 10 memory  refs.   Then  100,000

                 memory refs required just to record use bits.


     +   Other questions about working sets and memory  management  in

         general:

         +   What should T be?

             +   What if it's too large?

             +   What if it's too small?

                 +   plot STP vs. T, Page Fault Rate vs. I

         +   What  algorithms  should  be  used  to  determine   which

             processes are in the balance set?

         +   How much memory is needed in order to keep the CPU  busy?

             Note than under working set methods the CPU may occasion-

             ally sit idle even though there are runnable processes.

         +   (How do we compute working sets if pages are shared?)


     +   Working Set Restoration

         +   Idea is that when we remove a process from the  in-memory

             queue, we know what its working set is.

         +   When we run the process again (i.e.  promote  it  to  the

             in-memory  queue),  we  can  restore  the  working set to

             memory all at once.l

             +   Advantages:

                 +   minimize CPU overhead

                 +   don't have to wait for each  page  fault  ->  all

                     transfers at once.


                                  - .44 -


                 +   Can optimize layout when  writing  out,  and  can

                     fetch from consecutive locations

                 +   Or can just sort the fetches, so that average la-

                     tency is much smaller.


     +   A problem with working set is that even the  approximate  im-

         plementation above has a lot of overhead.  Instead, Opderbeck

         and Chu created an algorithm called

         +   Page Fault Frequency - Let X be the  virtual  time  since

             the last page fault for this process.

             +   At the time of a page  fault,  [If  X>T,  remove  all

                 pages  (of  the process) with the use bit off.]  Then

                 get a page frame for the new page, and turn  off  all

                 reference bits for the process.

         +   Idea was to make this a quick and easy way  to  implement

             working set.  Idea is that as long as process is faulting

             too often (<T), process will get  more  pages  and  cease

             faulting so frequently.

         +   Problem is that process can fault  frequently  and  still

             not  need  more  pages  - may be going through large data

             area.  Doesn't work as well as WS.


     +   What size should the pages be?

         +   We don't usually  have  a  choice  -  determined  by  the

             hardware.

             +   Can simulate  larger  page  size  with  hardware  for

                 smaller.


                                  - .45 -


         +   Large pages:

             +   less overhead - same overhead per  fault,  but  fewer

                 faults.

             +   More internal fragmentation.

             +   Each TLB entry covers more "address space", so  fewer

                 TLB misses.

             +   Smaller page tables.

             +   More total memory needed to cover same active locali-

                 ties.

             +   Greater delay (transfer time) for page fault.

             +   Less  overhead  for  running  replacement   algorithm

                 (fewer page frames to look at.)

         +   Typical size is 4096 bytes -e.g. IBM/370.  VAX  uses  512

             bytes.

         +   512 bytes is much too small -used for compatibility  with

             PDP-11.

         +   plot page fault vs. page size

         +   plot performance vs. page size


     +   Can we page the operating system??

         +   Yes, but we can't page the pages that do page  fetch  and

             replacement.   These  pages  are  said to be wired down..

             Also related pieces of code - e.g. dispatcher.

         +   Also can't page out anything where immediate response  is

             required.

             +   Interrupt and trap service routines.

             +   Real time control code.


                                  - .46 -


     +   What happens if we're doing I/O while using virtual memory?

         +   I/O system deals only with real addresses.

         +   So OS must  translate  virtual  buffer  address  to  real

             buffer address.

         +   We'd better make sure it doesn't get paged out.

         +   Usually done with a lock  bit.   "Locks"  the  page  into

             memory.  (placed in core map).

         +   Must be careful of page boundaries - since  I/O  is  done

             with real addresses.

             +   Break transfer into several, each inside a page

             +   Transfer is non-contiguous (if  IO  System  is  smart

                 enough)

             +   Make sure I/O buffers are page aligned.

             +   Put in contiguous area of real memory, managed by OS.


     +   How do we study paging algorithms?

         +   Could use math model- but no such model is acceptable.

         +   Could use random number driven simulation - same  problem

             as mathematical model.

         +   Could do experiments on real system - but difficult, time

             consuming,  need access to machine, may not be reproduci-

             ble.

         +   Use Trace Driven  Simulation  -  get  a  program  address

             trace,  and  use it to experiment with (simulate) various

             algorithms.

         +   Program address trace is  the  sequence  of  virtual  ad-

             dresses generated by a program.


                                  - .47 -


             +   Note that the virtual address trace is independent of

                 the page replacement algorithm, or anything else.

         +   How do we get a program address trace?

             +   Write machine interpreter - generates trace as it ex-

                 ecutes.

                 +   Use execute instruction on IBM 370

             +   Use hardware monitor

             +   Use trace trap facility - trap on every instruction.

             +   Use microcode modifications.

             +   Instrument the object code or assembly code to  write

                 trace  records  either  for every instruction, or for

                 every load, store and branch.

             +   Get page fault on every reference and generate  trace

                 record.


     +   Comparison of Algorithm Performance

         +   Plot of space vs. number  of  faults.   Show  LRU,  FIFO,

             Clock, WS and MIN.  Compare.


     +   Is there anything we can do to minimize page faults?

         +   Write our algorithms in an efficient manner.

             +   Matrix transpose as  example.   (Divide  matrix  into

                 submatrices.   Store  it  that  way - doesn't work if

                 only do algorithm that way.  They multiply,  add  and

                 transpose fine.)  Used in math programming problems.

         +   Do program restructuring - reorganize pieces of a program

             within  the pages, so that the number of faults is minim-


                                  - .48 -


             ized.

         +   Combine memory allocation with scheduling to produce good

             results.