diff options
author | rsc <rsc> | 2008-09-03 04:50:04 +0000 |
---|---|---|
committer | rsc <rsc> | 2008-09-03 04:50:04 +0000 |
commit | f53494c28e362fb7752bbc83417b9ba47cff0bf5 (patch) | |
tree | 7a7474710c9553b0188796ba24ae3af992320153 /web/l-schedule.html | |
parent | ee3f75f229742a376bedafe34d0ba04995a942be (diff) | |
download | xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.tar.gz xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.tar.bz2 xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.zip |
DO NOT MAIL: xv6 web pages
Diffstat (limited to 'web/l-schedule.html')
-rw-r--r-- | web/l-schedule.html | 340 |
1 files changed, 340 insertions, 0 deletions
diff --git a/web/l-schedule.html b/web/l-schedule.html new file mode 100644 index 0000000..d87d7da --- /dev/null +++ b/web/l-schedule.html @@ -0,0 +1,340 @@ +<title>Scheduling</title> +<html> +<head> +</head> +<body> + +<h1>Scheduling</h1> + +<p>Required reading: Eliminating receive livelock + +<p>Notes based on prof. Morris's lecture on scheduling (6.824, fall'02). + +<h2>Overview</h2> + +<ul> + +<li>What is scheduling? The OS policies and mechanisms to allocates +resources to entities. A good scheduling policy ensures that the most +important entitity gets the resources it needs. This topic was +popular in the days of time sharing, when there was a shortage of +resources. It seemed irrelevant in era of PCs and workstations, when +resources were plenty. Now the topic is back from the dead to handle +massive Internet servers with paying customers. The Internet exposes +web sites to international abuse and overload, which can lead to +resource shortages. Furthermore, some customers are more important +than others (e.g., the ones that buy a lot). + +<li>Key problems: +<ul> +<li>Gap between desired policy and available mechanism. The desired +policies often include elements that not implementable with the +mechanisms available to the operation system. Furthermore, often +there are many conflicting goals (low latency, high throughput, and +fairness), and the scheduler must make a trade-off between the goals. + +<li>Interaction between different schedulers. One have to take a +systems view. Just optimizing the CPU scheduler may do little to for +the overall desired policy. +</ul> + +<li>Resources you might want to schedule: CPU time, physical memory, +disk and network I/O, and I/O bus bandwidth. + +<li>Entities that you might want to give resources to: users, +processes, threads, web requests, or MIT accounts. + +<li>Many polices for resource to entity allocation are possible: +strict priority, divide equally, shortest job first, minimum guarantee +combined with admission control. + +<li>General plan for scheduling mechanisms +<ol> +<li> Understand where scheduling is occuring. +<li> Expose scheduling decisions, allow control. +<li> Account for resource consumption, to allow intelligent control. +</ol> + +<li>Simple example from 6.828 kernel. The policy for scheduling +environments is to give each one equal CPU time. The mechanism used to +implement this policy is a clock interrupt every 10 msec and then +selecting the next environment in a round-robin fashion. + +<p>But this only works if processes are compute-bound. What if a +process gives up some of its 10 ms to wait for input? Do we have to +keep track of that and give it back? + +<p>How long should the quantum be? is 10 msec the right answer? +Shorter quantum will lead to better interactive performance, but +lowers overall system throughput because we will reschedule more, +which has overhead. + +<p>What if the environment computes for 1 msec and sends an IPC to +the file server environment? Shouldn't the file server get more CPU +time because it operates on behalf of all other functions? + +<p>Potential improvements for the 6.828 kernel: track "recent" CPU use +(e.g., over the last second) and always run environment with least +recent CPU use. (Still, if you sleep long enough you lose.) Other +solution: directed yield; specify on the yield to which environment +you are donating the remainder of the quantuam (e.g., to the file +server so that it can compute on the environment's behalf). + +<li>Pitfall: Priority Inversion +<pre> + Assume policy is strict priority. + Thread T1: low priority. + Thread T2: medium priority. + Thread T3: high priority. + T1: acquire(l) + context switch to T3 + T3: acquire(l)... must wait for T1 to release(l)... + context switch to T2 + T2 computes for a while + T3 is indefinitely delayed despite high priority. + Can solve if T3 lends its priority to holder of lock it is waiting for. + So T1 runs, not T2. + [this is really a multiple scheduler problem.] + [since locks schedule access to locked resource.] +</pre> + +<li>Pitfall: Efficiency. Efficiency often conflicts with fairness (or +any other policy). Long time quantum for efficiency in CPU scheduling +versus low delay. Shortest seek versus FIFO disk scheduling. +Contiguous read-ahead vs data needed now. For example, scheduler +swaps out my idle emacs to let gcc run faster with more phys mem. +What happens when I type a key? These don't fit well into a "who gets +to go next" scheduler framework. Inefficient scheduling may make +<i>everybody</i> slower, including high priority users. + +<li>Pitfall: Multiple Interacting Schedulers. Suppose you want your +emacs to have priority over everything else. Give it high CPU +priority. Does that mean nothing else will run if emacs wants to run? +Disk scheduler might not know to favor emacs's disk I/Os. Typical +UNIX disk scheduler favors disk efficiency, not process prio. Suppose +emacs needs more memory. Other processes have dirty pages; emacs must +wait. Does disk scheduler know these other processes' writes are high +prio? + +<li>Pitfall: Server Processes. Suppose emacs uses X windows to +display. The X server must serve requests from many clients. Does it +know that emacs' requests should be given priority? Does the OS know +to raise X's priority when it is serving emacs? Similarly for DNS, +and NFS. Does the network know to give emacs' NFS requests priority? + +</ul> + +<p>In short, scheduling is a system problem. There are many +schedulers; they interact. The CPU scheduler is usually the easy +part. The hardest part is system structure. For example, the +<i>existence</i> of interrupts is bad for scheduling. Conflicting +goals may limit effectiveness. + +<h2>Case study: modern UNIX</h2> + +<p>Goals: +<ul> +<li>Simplicity (e.g. avoid complex locking regimes). +<li>Quick response to device interrupts. +<li> Favor interactive response. +</ul> + +<p>UNIX has a number of execution environments. We care about +scheduling transitions among them. Some transitions aren't possible, +some can't be be controlled. The execution environments are: + +<ul> +<li>Process, user half +<li>Process, kernel half +<li>Soft interrupts: timer, network +<li>Device interrupts +</ul> + +<p>The rules are: +<ul> +<li>User is pre-emptible. +<li>Kernel half and software interrupts are not pre-emptible. +<li>Device handlers may not make blocking calls (e.g., sleep) +<li>Effective priorities: intr > soft intr > kernel half > user +</ul> + +</ul> + +<p>Rules are implemented as follows: + +<ul> + +<li>UNIX: Process User Half. Runs in process address space, on +per-process stack. Interruptible. Pre-emptible: interrupt may cause +context switch. We don't trust user processes to yield CPU. +Voluntarily enters kernel half via system calls and faults. + +<li>UNIX: Process Kernel Half. Runs in kernel address space, on +per-process kernel stack. Executes system calls and faults for its +process. Interruptible (but can defer interrupts in critical +sections). Not pre-emptible. Only yields voluntarily, when waiting +for an event. E.g. disk I/O done. This simplifies concurrency +control; locks often not required. No user process runs if any kernel +half wants to run. Many process' kernel halfs may be sleeping in the +kernel. + +<li>UNIX: Device Interrupts. Hardware asks CPU for an interrupt to ask +for attention. Disk read/write completed, or network packet received. +Runs in kernel space, on special interrupt stack. Interrupt routine +cannot block; must return. Interrupts are interruptible. They nest +on the one interrupt stack. Interrupts are not pre-emptible, and +cannot really yield. The real-time clock is a device and interrupts +every 10ms (or whatever). Process scheduling decisions can be made +when interrupt returns (e.g. wake up the process waiting for this +event). You want interrupt processing to be fast, since it has +priority. Don't do any more work than you have to. You're blocking +processes and other interrupts. Typically, an interrupt does the +minimal work necessary to keep the device happy, and then call wakeup +on a thread. + +<li>UNIX: Soft Interrupts. (Didn't exist in xv6) Used when device +handling is expensive. But no obvious process context in which to +run. Examples include IP forwarding, TCP input processing. Runs in +kernel space, on interrupt stack. Interruptable. Not pre-emptable, +can't really yield. Triggered by hardware interrupt. Called when +outermost hardware interrupt returns. Periodic scheduling decisions +are made in timer s/w interrupt. Scheduled by hardware timer +interrupt (i.e., if current process has run long enough, switch). +</ul> + +<p>Is this good software structure? Let's talk about receive +livelock. + +<h2>Paper discussion</h2> + +<ul> + +<li>What is application that the paper is addressing: IP forwarding. +What functionality does a network interface offer to driver? +<ul> +<li> Read packets +<li> Poke hardware to send packets +<li> Interrupts when packet received/transmit complete +<li> Buffer many input packets +</ul> + +<li>What devices in the 6.828 kernel are interrupt driven? Which one +are polling? Is this ideal? + +<li>Explain Figure 6-1. Why does it go up? What determines how high +the peak is? Why does it go down? What determines how fast it goes +does? Answer: +<pre> +(fraction of packets discarded)(work invested in discarded packets) + ------------------------------------------- + (total work CPU is capable of) +</pre> + +<li>Suppose I wanted to test an NFS server for livelock. +<pre> + Run client with this loop: + while(1){ + send NFS READ RPC; + wait for response; + } +</pre> +What would I see? Is the NFS server probably subject to livelock? +(No--offered load subject to feedback). + +<li>What other problems are we trying to address? +<ul> +<li>Increased latency for packet delivery and forwarding (e.g., start +disk head moving when first NFS read request comes) +<li>Transmit starvation +<li>User-level CPU starvation +</ul> + +<li>Why not tell the O/S scheduler to give interrupts lower priority? +Non-preemptible. +Could you fix this by making interrupts faster? (Maybe, if coupled +with some limit on input rate.) + +<li>Why not completely process each packet in the interrupt handler? +(I.e. forward it?) Other parts of kernel don't expect to run at high +interrupt-level (e.g., some packet processing code might invoke a function +that sleeps). Still might want an output queue + +<li>What about using polling instead of interrupts? Solves overload +problem, but killer for latency. + +<li>What's the paper's solution? +<ul> +<li>No IP input queue. +<li>Input processing and device input polling in kernel thread. +<li>Device receive interrupt just wakes up thread. And leaves +interrupts *disabled* for that device. +<li>Thread does all input processing, then re-enables interrupts. +</ul> +<p>Why does this work? What happens when packets arrive too fast? +What happens when packets arrive slowly? + +<li>Explain Figure 6-3. +<ul> +<li>Why does "Polling (no quota)" work badly? (Input still starves +xmit complete processing.) +<li>Why does it immediately fall to zero, rather than gradually decreasing? +(xmit complete processing must be very cheap compared to input.) +</ul> + +<li>Explain Figure 6-4. +<ul> + +<li>Why does "Polling, no feedback" behave badly? There's a queue in +front of screend. We can still give 100% to input thread, 0% to +screend. + +<li>Why does "Polling w/ feedback" behave well? Input thread yields +when queue to screend fills. + +<li>What if screend hangs, what about other consumers of packets? +(e.g., can you ssh to machine to fix screend?) Fortunately screend +typically is only application. Also, re-enable input after timeout. + +</ul> + +<li>Why are the two solutions different? +<ol> +<li> Polling thread <i>with quotas</i>. +<li> Feedback from full queue. +</ol> +(I believe they should have used #2 for both.) + +<li>If we apply the proposed fixes, does the phenomemon totally go + away? (e.g. for web server, waits for disk, &c.) +<ul> +<li>Can the net device throw away packets without slowing down host? +<li>Problem: We want to drop packets for applications with big queues. +But requires work to determine which application a packet belongs to +Solution: NI-LRP (have network interface sort packets) +</ul> + +<li>What about latency question? (Look at figure 14 p. 243.) +<ul> +<li>1st packet looks like an improvement over non-polling. But 2nd +packet transmitted later with poling. Why? (No new packets added to +xmit buffer until xmit interrupt) +<li>Why? In traditional BSD, to +amortize cost of poking device. Maybe better to poke a second time +anyway. +</ul> + +<li>What if processing has more complex structure? +<ul> +<li>Chain of processing stages with queues? Does feedback work? + What happens when a late stage is slow? +<li>Split at some point, multiple parallel paths? No so great; one + slow path blocks all paths. +</ul> + +<li>Can we formulate any general principles from paper? +<ul> +<li>Don't spend time on new work before completing existing work. +<li>Or give new work lower priority than partially-completed work. +</ul> + +</ul> |