diff options
| author | rsc <rsc> | 2008-09-03 04:50:04 +0000 | 
|---|---|---|
| committer | rsc <rsc> | 2008-09-03 04:50:04 +0000 | 
| commit | f53494c28e362fb7752bbc83417b9ba47cff0bf5 (patch) | |
| tree | 7a7474710c9553b0188796ba24ae3af992320153 /web/l-lock.html | |
| parent | ee3f75f229742a376bedafe34d0ba04995a942be (diff) | |
| download | xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.tar.gz xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.tar.bz2 xv6-labs-f53494c28e362fb7752bbc83417b9ba47cff0bf5.zip | |
DO NOT MAIL: xv6 web pages
Diffstat (limited to 'web/l-lock.html')
| -rw-r--r-- | web/l-lock.html | 322 | 
1 files changed, 322 insertions, 0 deletions
| diff --git a/web/l-lock.html b/web/l-lock.html new file mode 100644 index 0000000..eea8217 --- /dev/null +++ b/web/l-lock.html @@ -0,0 +1,322 @@ +<title>L7</title> +<html> +<head> +</head> +<body> + +<h1>Locking</h1> + +<p>Required reading: spinlock.c + +<h2>Why coordinate?</h2> + +<p>Mutual-exclusion coordination is an important topic in operating +systems, because many operating systems run on +multiprocessors. Coordination techniques protect variables that are +shared among multiple threads and updated concurrently.  These +techniques allow programmers to implement atomic sections so that one +thread can safely update the shared variables without having to worry +that another thread intervening.  For example, processes in xv6 may +run concurrently on different processors and in kernel-mode share +kernel data structures.  We must ensure that these updates happen +correctly. + +<p>List and insert example: +<pre> + +struct List { +  int data; +  struct List *next; +}; + +List *list = 0; + +insert(int data) { +  List *l = new List; +  l->data = data; +  l->next = list;  // A +  list = l;        // B +} +</pre> + +<p>What needs to be atomic? The two statements labeled A and B should +always be executed together, as an indivisible fragment of code.  If +two processors execute A and B interleaved, then we end up with an +incorrect list.  To see that this is the case, draw out the list after +the sequence A1 (statement executed A by processor 1), A2 (statement A +executed by processor 2), B2, and B1. + +<p>How could this erroneous sequence happen? The varilable <i>list</i> +lives in physical memory shared among multiple processors, connected +by a bus.  The accesses to the shared memory will be ordered in some +total order by the bus/memory system.  If the programmer doesn't +coordinate the execution of the statements A and B, any order can +happen, including the erroneous one. + +<p>The erroneous case is called a race condition.  The problem with +races is that they are difficult to reproduce.  For example, if you +put print statements in to debug the incorrect behavior, you might +change the time and the race might not happen anymore. + +<h2>Atomic instructions</h2> + +<p>The programmer must be able express that A and B should be executed +as single atomic instruction.  We generally use a concept like locks +to mark an atomic region, acquiring the lock at the beginning of the +section and releasing it at the end:  + +<pre>  +void acquire(int *lock) { +   while (TSL(lock) != 0) ;  +} + +void release (int *lock) { +  *lock = 0; +} +</pre> + +<p>Acquire and release, of course, need to be atomic too, which can, +for example, be done with a hardware atomic TSL (try-set-lock) +instruction: + +<p>The semantics of TSL are: +<pre> +   R <- [mem]   // load content of mem into register R +   [mem] <- 1   // store 1 in mem. +</pre> + +<p>In a harware implementation, the bus arbiter guarantees that both +the load and store are executed without any other load/stores coming +in between. + +<p>We can use locks to implement an atomic insert, or we can use +TSL directly: +<pre> +int insert_lock = 0; + +insert(int data) { + +  /* acquire the lock: */ +  while(TSL(&insert_lock) != 0) +    ; + +  /* critical section: */ +  List *l = new List; +  l->data = data; +  l->next = list; +  list = l; + +  /* release the lock: */ +  insert_lock = 0; +} +</pre> + +<p>It is the programmer's job to make sure that locks are respected.  If +a programmer writes another function that manipulates the list, the +programmer must must make sure that the new functions acquires and +releases the appropriate locks.  If the programmer doesn't, race +conditions occur. + +<p>This code assumes that stores commit to memory in program order and +that all stores by other processors started before insert got the lock +are observable by this processor.  That is, after the other processor +released a lock, all the previous stores are committed to memory.  If +a processor executes instructions out of order, this assumption won't +hold and we must, for example, a barrier instruction that makes the +assumption true. + + +<h2>Example: Locking on x86</h2> + +<p>Here is one way we can implement acquire and release using the x86 +xchgl instruction: + +<pre> +struct Lock { +  unsigned int locked; +}; + +acquire(Lock *lck) { +  while(TSL(&(lck->locked)) != 0) +    ; +} + +release(Lock *lck) { +  lck->locked = 0; +} + +int +TSL(int *addr) +{ +  register int content = 1; +  // xchgl content, *addr +  // xchgl exchanges the values of its two operands, while +  // locking the memory bus to exclude other operations. +  asm volatile ("xchgl %0,%1" : +                "=r" (content), +                "=m" (*addr) : +                "0" (content), +                "m" (*addr)); +  return(content); +} +</pre> + +<p>the instruction "XCHG %eax, (content)" works as follows: +<ol> +<li> freeze other CPUs' memory activity +<li> temp := content +<li> content := %eax +<li> %eax := temp +<li> un-freeze other CPUs +</ol> + +<p>steps 1 and 5 make XCHG special: it is "locked" special signal +  lines on the inter-CPU bus, bus arbitration + +<p>This implementation doesn't scale to a large number of processors; +  in a later lecture we will see how we could do better. + +<h2>Lock granularity</h2> + +<p>Release/acquire is ideal for short atomic sections: increment a +counter, search in i-node cache, allocate a free buffer. + +<p>What are spin locks not so great for?  Long atomic sections may +  waste waiters' CPU time and it is to sleep while holding locks. In +  xv6 we try to avoid long atomic sections by carefully coding (can +  you find an example?).  xv6 doesn't release the processor when +  holding a lock, but has an additional set of coordination primitives +  (sleep and wakeup), which we will study later. + +<p>My list_lock protects all lists; inserts to different lists are +  blocked. A lock per list would waste less time spinning so you might +  want "fine-grained" locks, one for every object BUT acquire/release +  are expensive (500 cycles on my 3 ghz machine) because they need to +  talk off-chip. + +<p>Also, "correctness" is not that simple with fine-grained locks if +  need to maintain global invariants; e.g., "every buffer must be on +  exactly one of free list and device list".  Per-list locks are +  irrelevant for this invariant.  So you might want "large-grained", +  which reduces overhead but reduces concurrency. + +<p>This tension is hard to get right. One often starts out with +  "large-grained locks" and measures the performance of the system on +  some workloads. When more concurrency is desired (to get better +  performance), an implementor may switch to a more fine-grained +  scheme. Operating system designers fiddle with this all the time. + +<h2>Recursive locks and modularity</h2> + +<p>When designing a system we desire clean abstractions and good +  modularity. We like a caller not have to know about how a callee +  implements a particul functions.  Locks make achieving modularity +  more complicated.  For example, what to do when the caller holds a +  lock, then calls a function, which also needs to the lock to perform +  its job. + +<p>There are no transparent solutions that allow the caller and callee +  to be unaware of which lokcs they use. One transparent, but +  unsatisfactory option is recursive locks: If a callee asks for a +  lock that its caller has, then we allow the callee to proceed. +  Unfortunately, this solution is not ideal either. + +<p>Consider the following. If lock x protects the internals of some +  struct foo, then if the caller acquires lock x, it know that the +  internals of foo are in a sane state and it can fiddle with them. +  And then the caller must restore them to a sane state before release +  lock x, but until then anything goes. + +<p>This assumption doesn't hold with recursive locking.  After +  acquiring lock x, the acquirer knows that either it is the first to +  get this lock, in which case the internals are in a sane state, or +  maybe some caller holds the lock and has messed up the internals and +  didn't realize when calling the callee that it was going to try to +  look at them too.  So the fact that a function acquired the lock x +  doesn't guarantee anything at all. In short, locks protect against +  callers and callees just as much as they protect against other +  threads. + +<p>Since transparent solutions aren't ideal, it is better to consider +  locks part of the function specification. The programmer must +  arrange that a caller doesn't invoke another function while holding +  a lock that the callee also needs. + +<h2>Locking in xv6</h2> + +<p>xv6 runs on a multiprocessor and is programmed to allow multiple +threads of computation to run concurrently.  In xv6 an interrupt might +run on one processor and a process in kernel mode may run on another +processor, sharing a kernel data structure with the interrupt routing. +xv6 uses locks, implemented using an atomic instruction, to coordinate +concurrent activities. + +<p>Let's check out why xv6 needs locks by following what happens when +we start a second processor: +<ul> +<li>1516: mp_init (called from main0) +<li>1606: mp_startthem (called from main0) +<li>1302: mpmain +<li>2208: scheduler. +  <br>Now we have several processors invoking the scheduler +  function. xv6 better ensure that multiple processors don't run the +  same process!  does it? +  <br>Yes, if multiple schedulers run concurrently, only one will +  acquire proc_table_lock, and proceed looking for a runnable +  process. if it finds a process, it will mark it running, longjmps to +  it, and the process will release proc_table_lock.  the next instance +  of scheduler will skip this entry, because it is marked running, and +  look for another runnable process. +</ul> + +<p>Why hold proc_table_lock during a context switch?  It protects +p->state; the process has to hold some lock to avoid a race with +wakeup() and yield(), as we will see in the next lectures. + +<p>Why not a lock per proc entry?  It might be expensive in in whole +table scans (in wait, wakeup, scheduler). proc_table_lock also +protects some larger invariants, for example it might be hard to get +proc_wait() right with just per entry locks.  Right now the check to +see if there are any exited children and the sleep are atomic -- but +that would be hard with per entry locks.  One could have both, but +that would probably be neither clean nor fast. + +<p>Of course, there is only processor searching the proc table if +acquire is implemented correctly. Let's check out acquire in +spinlock.c: +<ul> +<li>1807: no recursive locks! +<li>1811: why disable interrupts on the current processor? (if +interrupt code itself tries to take a held lock, xv6 will deadlock; +the panic will fire on 1808.) +<ul> +<li>can a process on a processor hold multiple locks? +</ul> +<li>1814: the (hopefully) atomic instruction. +<ul> +<li>see sheet 4, line 0468. +</ul> +<li>1819: make sure that stores issued on other processors before we +got the lock are observed by this processor.  these may be stores to +the shared data structure that is protected by the lock. +</ul> + +<p> + +<h2>Locking in JOS</h2> + +<p>JOS is meant to run on single-CPU machines, and the plan can be +simple.  The simple plan is disabling/enabling interrupts in the +kernel (IF flags in the EFLAGS register).  Thus, in the kernel, +threads release the processors only when they want to and can ensure +that they don't release the processor during a critical section. + +<p>In user mode, JOS runs with interrupts enabled, but Unix user +applications don't share data structures.  The data structures that +must be protected, however, are the ones shared in the library +operating system (e.g., pipes). In JOS we will use special-case +solutions, as you will find out in lab 6.  For example, to implement +pipe we will assume there is one reader and one writer.  The reader +and writer never update each other's variables; they only read each +other's variables.  Carefully programming using this rule we can avoid +races. | 
