Hyuntae Kim Gi Hong Kang 3/5/2007 --------------------------------------------------------------------------------------------------- Announcements: - exams will be handed back on wednesday --------------------------------------------------------------------------------------------------- Topics : 1. implementing LRU with used bit or dirty bit(clock algorithm) ' 2. thrashing 3. working set 4. page study 5. analysis of paging algorithms. --------------------------------------------------------------------------------------------------- 1. implementing LRU with used bit or dirty bit (Clock algorithm) + stack algorithm - the set of pages in a memory of size N at time t is always a subset of the set of pages in a memory of size N+1 at time t. - stack is list of pages in order of size of memory which includes them + LRU (Least Recently Used): - Replace page that hasn ¡¯t been used for the longest time - Programs have locality, so if something not used for a while, unlikely to be used in the near future. - Seems like LRU should be a good approximation to MIN. + implementing LRU - since we need to keep track of which pages have been used recently, hardware support is needed - how to implement LRU? * keep a register for each page. on each use, set the register with the system clock. when we need to replace a page, scan all registers of pages and find oldest clock. this could be expensive if there are many memory pages. * use a liked list : on each use, remove page from list and place at head. LRU page is a tail. - Implementing perfect LRL is almost impossible, so we settle for an approximation that is efficient. It is not necessary to fine oldest. Just find old page. + use bit (reference bit) - a use bit is set when this page is referenced. it will turned off under os control + Clock algorithm - replace on old page, not the oldest page ? Hardware ¡°use¡± bit per physical page - Hardware sets use bit on each reference - If use bit isn¡¯t set, means not referenced in a long time - have a pointer pointing to the K'th page frame. - to replace a page, look at the use bit of the page being pointed to - if it is on, then turn it off. increment the pointer and repeat - if it is off, replace the page in that page frame, set use(k) = 1 - it is also called FINUFO(first in, not used, first out) () () () () () (pointer) () -------->() * () : page frame () () () () () Q) what does it mean if the clock hand is sweeping very slowly? - few page faults Q) what does it mean if the clock hand is sweeping very fast? - a lot of page faults + "dirty" bit can also be used. - because it is more expensive to throw out dirty pages(need to write it to disk) - tradeoffs? * cost of page fault declines : lower probability of writing out dirty block * probability of fault increases : ex) if clock was a good algorithm, and we mess with it , it should make it worse. Q) how would Least Frequently used replacement work? - since locality changes, it would be not good. * Locality : Program likely to access a relatively small portion of the address space at any instant of time. Temporal Locality: Locality in Time Spatial Locality: Locality in Space + A per process replacement algorithm or local page replacement algorithm, or per job replacement algorithm allocate page frames to individual processes. - This means that a page fault in one process can only replace one of that process' frames. - An important fact because of this: Other processes cannot interfere with one another. + If all pages from all processes are lumped together by the replacement algorithm, then it is said to be a global replacement algorithm. - Under this scheme, each process competes with all of the other processes for page frames. + Local algorithms: - Protects jobs from others which are badly behaved. - Hard to decide how much space to allocate to each process. - Allocation may be unreasonable. + Global algorithms: - Permits memory allocation for process to shift over time. - Permits memory allocation to adapt to process needs. - Permits badly behaved process to grab too much memory. ------------------------------------------------------------------------------------------------------ 2. thrashing + thrashing : A situation when the page fault rate is so high that the system spends most of its time either processing a page fault or waiting for a page to arrive. in other word, a process is busy swapping pages in and out - If a process does not have ¡°enough¡± pages, the pagefault rate is very high. This leads to: * low CPU utilization * operating system spends most of its time swapping to disk - Suppose there are many users, and that between them their processes are making frequent references to 50 pages, but memory has 40 pages. - Each time one page is brought in, another page, whose contents will soon be referenced, is thrown out. - Compute average memory access time. - The system will spend all of its time reading and writing pages. It will be working very hard but not getting anything done. - The progress of the programs will make it look like the access time of memory is as slow as disk, rather than disks being as fast as memory. - Plot of CPU utilization vs. level of multiprogramming. CPU Utilization | |----------------------> Thrashing | | | max of CPU UTIL | | -----> __|__ | / | \ | / | \ | / | \ | / | \ | / | \ | / | \ | / | \ | / \ | / \ | / \ | / \ | / \ |/ \ |______________________________________________ degree of muliprogramming + Thrashing occurs because the system doesn't know when it has taken on more work than it can handle. LRU mechanisms order pages in terms of last access,but don't give absolute numbers indicating pages that mustn't be thrown out. Q) What do humans do when thrashing? If flunking all courses at midterm time, drop one. + solution to thrashing - for a single process which is too large for memory, no solution, buy more memory - for a multiprocess * Figure out how much memory each process needs.Change scheduling priorities to run processes in groups whose memory needs can be satisfied. * Shed load. * Change paging algorithm -------------------------------------------------------------------------------------------------------- 3. working set + Working Sets (IBM) are a solution proposed by Peter Denning. An informal definition is - Working set = "the set of pages that a process is working with, and which must thus be resident if the process is to avoid thrashing." - formally, "Exactly that set of pages used in the preceeding T virtual time units" (T usually given in units of memory references.) - Choose T, the working set parameter. At any given time, all pages referenced by a process in its last T seconds of execution are considered to comprise its working set. - Working Set Paging Algorithm keeps in memory exactly those pages used in the preceding T time units. - Minimum values for T are about 10,000 to 100,000 memory references. - working set is dynamic because working set varies all the time Most processes don't need a staic number of page frames. (i.e. compiler has different phases) + A process will naver be executed unless its working set is resident in main memory. page outside the working set may be discarded at any time. - note that this requres a reservoir of unassigned page frames. + working set paging requires that the sum of the sizes of the working sets of the jobs eligible to run (which we will call the balance set) be less than or equal to the amount of space available. We previously referred to the balance set as the jobs in the in-memory queue. - Some algorithm must be provided for moving processes into and out of the balance set. What happens if the balance set changes too frequently? * Still get thrashing + As working sets change, corresponding changes will have to be made in the balance set. + Advantage over LRU - Working set works with a variable amount of space. - Working set adjust the amount of space based on the process' need <-> LRU works with a fixed amount of memory. + Implementing working set - Initial plan * Use a capacitor with each memory page. * charge the capacitor on each reference -> start discharging * The size of capacitor will determine the time unit T(tau). * If we want separate working set for each process, > the capacitor should only be discharged when a particular process executes. - Actual solution - use bits. * OS keeps track of idle time for each page. * All pages of a process are scanned every once in a while. > use bit on -> clear page's idle time > use bit off-> add process' current time to idle time * scans happen on order of every few seconds + Overhead of scanning - sampling reference bits regularly Usually 5 instructions to sample one bit. - if 10000 memory references are sampled at time, 100000 instruction is required for sampling, with 4k pages. - Choosing proper size for T * T is too large -> page fault rate is low and STP is high * T is too small -> both Page fault rate and STP are high Page fault | | rapidly rate | | decrease | | | \ | \ | \ | \ flat | \ - - - - - - |------------------------------- T STP | | | | | | | | | | \ / | \ / | \ / | \ - - - - - / | |------------------------------- T + Working set restoration - idea - when we remove a process from the queue, we know what its working set is. - restoration - when a process gets run again, the working set associated to the process will be restored at once. - advantages * Minimize CPU overhead * No waiting for each page fault -> all transfers at once * Possible optimization in layout when writing out * Possible to fetch from consecutive locations * Possible to sort fetches, which reduces average latency. + Problem - still has a lot of overhead - Page Fault Frequency algorithm - created by Opderbeck and Chu * Idea - As long as process is faulting too often, process will get more pages and cease faulting so frequently. * Algorithm ->Let X be the virtual time since the last page fault for this process. ->If there is a page fault, the get a page frame for the new page. (if X>T, remove all pages with the use bit off) ->Turn off all reference bits for the process - Problem * Process can fault frequently but it may not need more pages ->Doesn't work as well as WS. ----------------------------------------------------------------------------------------------- 4. page study + Page Size - Determined by the hardware * possible to simulate larger page size with smaller hardware. - Large pages * Less overhead - same number of overhead per fault, but fewer faults. * More internal fragmentation * Fewer TLB misses (each TLB entry covers more memory space) * Smaller page tables * More total memory needed to cover same active localities * Greater delay for page fault * Less overhead for running replacement algorithm - Typical size - 4K bytes * Sometimes 512 bytes is also used for compatibility - Page size vs Page fault rate * Page fault rate becomes as page size gets bigger.However, when page is too big, Page fault rate also increase since too big pages doesn't fit in memory Page fault | | rate | | | | | \ | \ | \ / | \ / | \ - - - / |-------------------------------- page size - Page size vs Performance * Performance increases as page size gets bigger. * However, when page is too big, Performance starts decreasing since big pages size produces big transfer time. Performance | - - - | / \ | / \ | / | / | / | | || |-------------------------------- page size + Almost anything in memory can be paged out, including the operating system,but some pages in memory can't. - You cannot page out : pages that do page fetch and replacement. these pages are said to be wired down also anything where immediate response is required -> ex : interrupt and trap service routines real time control code + I/O and virtual memory - I/O system deals only with physical addresses. - OS must translate virtual buffer address to physical buffer address - We need to make sure it doesn't get paged out. - Lock bit - locks the page into memory - Page boundaries * Problems Possible to break transfer into several pages VAS PAS | | | | |--------| | ////// | | | |--------| | ////// | | | |-//////-| | | | ////// | |--------| | | | ////// | |--------| | | | | |--------| Transfer is non-countiguous * Solution -> Make sure I/O buffers are page aligned -> Put in contiguous area of real memory ---------------------------------------------------------------------------------------------------- 5. analysis of paging algorithms. + Algorithm Study - Math model - Not acceptable - Random number driven Simulation - Not acceptable - Experiments on real system * Difficult and time consuming * May not be reproducible - Trace Driven System * We use a program address trace to simulate various algorithms. * Program address trace: -> Sequence of virtual address generated by a program. -> Virtual address trace is independent of the page replacement algorithm. * Getting a program address trace -> Machine Interpreter - generates trace as it executes -> Hardware monitor -> Trace trap facility - traps on every instruction -> Microcode modification -> Instrument the objet code or assembly code to write trace - records either for every instruction, or for every load, store and branch. -> Page fault - get page fault on every reference and generate trace record.