<html> <head> <title>Lab: locks</title> <link rel="stylesheet" href="homework.css" type="text/css" /> </head> <body> <h1>Lab: locks</h1> <p>In this lab you will try to avoid lock contention for certain workloads. <h2>lock contention</h2> <p>The program user/kalloctest stresses xv6's memory allocator: three processes grow and shrink there address space, which will results in many calls to <tt>kalloc</tt> and <tt>kfree</tt>, respectively. <tt>kalloc</tt> and <tt>kfree</tt> obtain <tt>kmem.lock</tt>. To see if there is lock contention for <tt>kmem.lock</tt> replace the call to <tt>acquire</tt> in <tt>kalloc</tt> with the following code: <pre> while(!tryacquire(&kmem.lock)) { printf("!"); } </pre> <p><tt>tryacquire</tt> tries to acquire <tt>kmem.lock</tt>: if the lock is taking it returns false (0); otherwise, it returns true (1) and with the lock acquired. Your first job is to implement <tt>tryacquire</tt> in kernel/spinlock.c. <p>A few hints: <ul> <li>look at <tt>acquire</tt>. <li>don't forget to restore interrupts when acquision fails <li>Add tryacquire's signature to defs.h. </ul> <p>Run usertests to see if you didn't break anything. Note that usertests never prints "!"; there is never contention for <tt>kmem.lock</tt>. The caller is always able to immediately acquire the lock and never has to wait because some other process has the lock. <p>Now run kalloctest. You should see quite a number of "!" on the console. kalloctest causes many processes to contend on the <tt>kmem.lock</tt>. This lock contention is a bit artificial, because qemu is simulating 3 processors, but it is likely on real hardware, there would be contention too. <h2>Removing lock contention</h2> <p>The root cause of lock contention in kalloctest is that there is a single free list, protected by a single lock. To remove lock contention, you will have to redesign the memory allocator to avoid a single lock and list. The basic idea is to maintain a free list per CPU, each list with its own lock. Allocations and frees on each CPU can run in parallel, because each CPU will operate on a different list. <p> The main challenge will be to deal with the case that one CPU runs out of memory, but another CPU has still free memory; in that case, the one CPU must "steal" part of the other CPU's free list. Stealing may introduce lock contention, but that may be acceptable because it may happen infrequently. <p>Your job is to implement per-CPU freelists and stealing when one CPU is out of memory. Run kalloctest() to see if your implementation has removed lock contention. <p>Some hints: <ul> <li>You can use the constant <tt>NCPU</tt> in kernel/param.h <li>Let <tt>freerange</tt> give all free memory to the CPU running <tt>freerange</tt>. <li>The function <tt>cpuid</tt> returns the current core, but note that you can use it when interrupts are turned off and so you will need to turn on/off interrupts in your solution. </ul> <p>Run usertests to see if you don't break anything. <h2>More scalabale bcache lookup</h2> <p>Several processes reading different files repeatedly will bottleneck in the buffer cache, bcache, in bio.c. Replace the acquire in <tt>bget</tt> with <pre> while(!tryacquire(&bcache.lock)) { printf("!"); } </pre> and run test0 from bcachetest and you will see "!"s. <p>Modify <tt>bget</tt> so that a lookup for a buffer that is in the bcache doesn't need to acquire <tt>bcache.lock</tt>. This is more tricky than the kalloc assignment, because bcache buffers are truly shared among processes. You must maintain the invariant that a buffer is only once in memory. <p> There are several races that <tt>bcache.lock</tt> protects against, including: <ul> <li>A <tt>brelse</tt> may set <tt>b->ref</tt> to 0, while concurrent <tt>bget</tt> is incrementing it. <li>Two <tt>bget</tt> may see <tt>b->ref = 0</tt> and one may re-use the buffer, while the other may replaces it with another block. <li>A concurrent <tt>brelse</tt> modifies the list that <tt>bget</tt> traverses. </ul> <p>A challenge is testing whether you code is still correct. One way to do is to artificially delay certain operations using <tt>sleepticks</tt>. <tt>test1</tt> trashes the buffer cache and exercises more code paths. <p>Here are some hints: <ul> <li>Read the description of buffer cache in the xv6 book (Section 7.2). <li>Use a simple design: i.e., don't design a lock-free implementation. <li>Use a simple hash table with locks per bucket. <li>Searching in hash table for a buffer and allocating an entry for that buffer when the buffer is not found must be atomic. <li>It is fine to acquire <tt>bcache.lock</tt> in <tt>brelse</tt> to update the LRU/MRU list. </ul> <p>Check that your implementation has less contention on <tt>test0</tt> <p>Make sure your implementation passes bcachetest and usertests. <p>Optional: <ul> <li>make the buffer cache more scalable (e.g., avoid taking out <tt>bcache.lock</tt> on <tt>brelse</tt>). <li>make lookup lock-free (Hint: use gcc's <tt>__sync_*</tt> functions.) How do you convince yourself that your implementation is correct? </ul> </body> </html>