diff options
Diffstat (limited to 'web/l-schedule.html')
-rw-r--r-- | web/l-schedule.html | 340 |
1 files changed, 0 insertions, 340 deletions
diff --git a/web/l-schedule.html b/web/l-schedule.html deleted file mode 100644 index d87d7da..0000000 --- a/web/l-schedule.html +++ /dev/null @@ -1,340 +0,0 @@ -<title>Scheduling</title> -<html> -<head> -</head> -<body> - -<h1>Scheduling</h1> - -<p>Required reading: Eliminating receive livelock - -<p>Notes based on prof. Morris's lecture on scheduling (6.824, fall'02). - -<h2>Overview</h2> - -<ul> - -<li>What is scheduling? The OS policies and mechanisms to allocates -resources to entities. A good scheduling policy ensures that the most -important entitity gets the resources it needs. This topic was -popular in the days of time sharing, when there was a shortage of -resources. It seemed irrelevant in era of PCs and workstations, when -resources were plenty. Now the topic is back from the dead to handle -massive Internet servers with paying customers. The Internet exposes -web sites to international abuse and overload, which can lead to -resource shortages. Furthermore, some customers are more important -than others (e.g., the ones that buy a lot). - -<li>Key problems: -<ul> -<li>Gap between desired policy and available mechanism. The desired -policies often include elements that not implementable with the -mechanisms available to the operation system. Furthermore, often -there are many conflicting goals (low latency, high throughput, and -fairness), and the scheduler must make a trade-off between the goals. - -<li>Interaction between different schedulers. One have to take a -systems view. Just optimizing the CPU scheduler may do little to for -the overall desired policy. -</ul> - -<li>Resources you might want to schedule: CPU time, physical memory, -disk and network I/O, and I/O bus bandwidth. - -<li>Entities that you might want to give resources to: users, -processes, threads, web requests, or MIT accounts. - -<li>Many polices for resource to entity allocation are possible: -strict priority, divide equally, shortest job first, minimum guarantee -combined with admission control. - -<li>General plan for scheduling mechanisms -<ol> -<li> Understand where scheduling is occuring. -<li> Expose scheduling decisions, allow control. -<li> Account for resource consumption, to allow intelligent control. -</ol> - -<li>Simple example from 6.828 kernel. The policy for scheduling -environments is to give each one equal CPU time. The mechanism used to -implement this policy is a clock interrupt every 10 msec and then -selecting the next environment in a round-robin fashion. - -<p>But this only works if processes are compute-bound. What if a -process gives up some of its 10 ms to wait for input? Do we have to -keep track of that and give it back? - -<p>How long should the quantum be? is 10 msec the right answer? -Shorter quantum will lead to better interactive performance, but -lowers overall system throughput because we will reschedule more, -which has overhead. - -<p>What if the environment computes for 1 msec and sends an IPC to -the file server environment? Shouldn't the file server get more CPU -time because it operates on behalf of all other functions? - -<p>Potential improvements for the 6.828 kernel: track "recent" CPU use -(e.g., over the last second) and always run environment with least -recent CPU use. (Still, if you sleep long enough you lose.) Other -solution: directed yield; specify on the yield to which environment -you are donating the remainder of the quantuam (e.g., to the file -server so that it can compute on the environment's behalf). - -<li>Pitfall: Priority Inversion -<pre> - Assume policy is strict priority. - Thread T1: low priority. - Thread T2: medium priority. - Thread T3: high priority. - T1: acquire(l) - context switch to T3 - T3: acquire(l)... must wait for T1 to release(l)... - context switch to T2 - T2 computes for a while - T3 is indefinitely delayed despite high priority. - Can solve if T3 lends its priority to holder of lock it is waiting for. - So T1 runs, not T2. - [this is really a multiple scheduler problem.] - [since locks schedule access to locked resource.] -</pre> - -<li>Pitfall: Efficiency. Efficiency often conflicts with fairness (or -any other policy). Long time quantum for efficiency in CPU scheduling -versus low delay. Shortest seek versus FIFO disk scheduling. -Contiguous read-ahead vs data needed now. For example, scheduler -swaps out my idle emacs to let gcc run faster with more phys mem. -What happens when I type a key? These don't fit well into a "who gets -to go next" scheduler framework. Inefficient scheduling may make -<i>everybody</i> slower, including high priority users. - -<li>Pitfall: Multiple Interacting Schedulers. Suppose you want your -emacs to have priority over everything else. Give it high CPU -priority. Does that mean nothing else will run if emacs wants to run? -Disk scheduler might not know to favor emacs's disk I/Os. Typical -UNIX disk scheduler favors disk efficiency, not process prio. Suppose -emacs needs more memory. Other processes have dirty pages; emacs must -wait. Does disk scheduler know these other processes' writes are high -prio? - -<li>Pitfall: Server Processes. Suppose emacs uses X windows to -display. The X server must serve requests from many clients. Does it -know that emacs' requests should be given priority? Does the OS know -to raise X's priority when it is serving emacs? Similarly for DNS, -and NFS. Does the network know to give emacs' NFS requests priority? - -</ul> - -<p>In short, scheduling is a system problem. There are many -schedulers; they interact. The CPU scheduler is usually the easy -part. The hardest part is system structure. For example, the -<i>existence</i> of interrupts is bad for scheduling. Conflicting -goals may limit effectiveness. - -<h2>Case study: modern UNIX</h2> - -<p>Goals: -<ul> -<li>Simplicity (e.g. avoid complex locking regimes). -<li>Quick response to device interrupts. -<li> Favor interactive response. -</ul> - -<p>UNIX has a number of execution environments. We care about -scheduling transitions among them. Some transitions aren't possible, -some can't be be controlled. The execution environments are: - -<ul> -<li>Process, user half -<li>Process, kernel half -<li>Soft interrupts: timer, network -<li>Device interrupts -</ul> - -<p>The rules are: -<ul> -<li>User is pre-emptible. -<li>Kernel half and software interrupts are not pre-emptible. -<li>Device handlers may not make blocking calls (e.g., sleep) -<li>Effective priorities: intr > soft intr > kernel half > user -</ul> - -</ul> - -<p>Rules are implemented as follows: - -<ul> - -<li>UNIX: Process User Half. Runs in process address space, on -per-process stack. Interruptible. Pre-emptible: interrupt may cause -context switch. We don't trust user processes to yield CPU. -Voluntarily enters kernel half via system calls and faults. - -<li>UNIX: Process Kernel Half. Runs in kernel address space, on -per-process kernel stack. Executes system calls and faults for its -process. Interruptible (but can defer interrupts in critical -sections). Not pre-emptible. Only yields voluntarily, when waiting -for an event. E.g. disk I/O done. This simplifies concurrency -control; locks often not required. No user process runs if any kernel -half wants to run. Many process' kernel halfs may be sleeping in the -kernel. - -<li>UNIX: Device Interrupts. Hardware asks CPU for an interrupt to ask -for attention. Disk read/write completed, or network packet received. -Runs in kernel space, on special interrupt stack. Interrupt routine -cannot block; must return. Interrupts are interruptible. They nest -on the one interrupt stack. Interrupts are not pre-emptible, and -cannot really yield. The real-time clock is a device and interrupts -every 10ms (or whatever). Process scheduling decisions can be made -when interrupt returns (e.g. wake up the process waiting for this -event). You want interrupt processing to be fast, since it has -priority. Don't do any more work than you have to. You're blocking -processes and other interrupts. Typically, an interrupt does the -minimal work necessary to keep the device happy, and then call wakeup -on a thread. - -<li>UNIX: Soft Interrupts. (Didn't exist in xv6) Used when device -handling is expensive. But no obvious process context in which to -run. Examples include IP forwarding, TCP input processing. Runs in -kernel space, on interrupt stack. Interruptable. Not pre-emptable, -can't really yield. Triggered by hardware interrupt. Called when -outermost hardware interrupt returns. Periodic scheduling decisions -are made in timer s/w interrupt. Scheduled by hardware timer -interrupt (i.e., if current process has run long enough, switch). -</ul> - -<p>Is this good software structure? Let's talk about receive -livelock. - -<h2>Paper discussion</h2> - -<ul> - -<li>What is application that the paper is addressing: IP forwarding. -What functionality does a network interface offer to driver? -<ul> -<li> Read packets -<li> Poke hardware to send packets -<li> Interrupts when packet received/transmit complete -<li> Buffer many input packets -</ul> - -<li>What devices in the 6.828 kernel are interrupt driven? Which one -are polling? Is this ideal? - -<li>Explain Figure 6-1. Why does it go up? What determines how high -the peak is? Why does it go down? What determines how fast it goes -does? Answer: -<pre> -(fraction of packets discarded)(work invested in discarded packets) - ------------------------------------------- - (total work CPU is capable of) -</pre> - -<li>Suppose I wanted to test an NFS server for livelock. -<pre> - Run client with this loop: - while(1){ - send NFS READ RPC; - wait for response; - } -</pre> -What would I see? Is the NFS server probably subject to livelock? -(No--offered load subject to feedback). - -<li>What other problems are we trying to address? -<ul> -<li>Increased latency for packet delivery and forwarding (e.g., start -disk head moving when first NFS read request comes) -<li>Transmit starvation -<li>User-level CPU starvation -</ul> - -<li>Why not tell the O/S scheduler to give interrupts lower priority? -Non-preemptible. -Could you fix this by making interrupts faster? (Maybe, if coupled -with some limit on input rate.) - -<li>Why not completely process each packet in the interrupt handler? -(I.e. forward it?) Other parts of kernel don't expect to run at high -interrupt-level (e.g., some packet processing code might invoke a function -that sleeps). Still might want an output queue - -<li>What about using polling instead of interrupts? Solves overload -problem, but killer for latency. - -<li>What's the paper's solution? -<ul> -<li>No IP input queue. -<li>Input processing and device input polling in kernel thread. -<li>Device receive interrupt just wakes up thread. And leaves -interrupts *disabled* for that device. -<li>Thread does all input processing, then re-enables interrupts. -</ul> -<p>Why does this work? What happens when packets arrive too fast? -What happens when packets arrive slowly? - -<li>Explain Figure 6-3. -<ul> -<li>Why does "Polling (no quota)" work badly? (Input still starves -xmit complete processing.) -<li>Why does it immediately fall to zero, rather than gradually decreasing? -(xmit complete processing must be very cheap compared to input.) -</ul> - -<li>Explain Figure 6-4. -<ul> - -<li>Why does "Polling, no feedback" behave badly? There's a queue in -front of screend. We can still give 100% to input thread, 0% to -screend. - -<li>Why does "Polling w/ feedback" behave well? Input thread yields -when queue to screend fills. - -<li>What if screend hangs, what about other consumers of packets? -(e.g., can you ssh to machine to fix screend?) Fortunately screend -typically is only application. Also, re-enable input after timeout. - -</ul> - -<li>Why are the two solutions different? -<ol> -<li> Polling thread <i>with quotas</i>. -<li> Feedback from full queue. -</ol> -(I believe they should have used #2 for both.) - -<li>If we apply the proposed fixes, does the phenomemon totally go - away? (e.g. for web server, waits for disk, &c.) -<ul> -<li>Can the net device throw away packets without slowing down host? -<li>Problem: We want to drop packets for applications with big queues. -But requires work to determine which application a packet belongs to -Solution: NI-LRP (have network interface sort packets) -</ul> - -<li>What about latency question? (Look at figure 14 p. 243.) -<ul> -<li>1st packet looks like an improvement over non-polling. But 2nd -packet transmitted later with poling. Why? (No new packets added to -xmit buffer until xmit interrupt) -<li>Why? In traditional BSD, to -amortize cost of poking device. Maybe better to poke a second time -anyway. -</ul> - -<li>What if processing has more complex structure? -<ul> -<li>Chain of processing stages with queues? Does feedback work? - What happens when a late stage is slow? -<li>Split at some point, multiple parallel paths? No so great; one - slow path blocks all paths. -</ul> - -<li>Can we formulate any general principles from paper? -<ul> -<li>Don't spend time on new work before completing existing work. -<li>Or give new work lower priority than partially-completed work. -</ul> - -</ul> |