Lab: locks

In this lab you will try to avoid lock contention for certain workloads.

lock contention

The program user/kalloctest stresses xv6's memory allocator: three processes grow and shrink there address space, which will results in many calls to kalloc and kfree, respectively. kalloc and kfree obtain kmem.lock. To see if there is lock contention for kmem.lock replace the call to acquire in kalloc with the following code:

    while(!tryacquire(&kmem.lock)) {
      printf("!");
    }

tryacquire tries to acquire kmem.lock: if the lock is taking it returns false (0); otherwise, it returns true (1) and with the lock acquired. Your first job is to implement tryacquire in kernel/spinlock.c.

A few hints:

look at acquire.
don't forget to restore interrupts when acquision fails
Add tryacquire's signature to defs.h.

Run usertests to see if you didn't break anything. Note that usertests never prints "!"; there is never contention for kmem.lock. The caller is always able to immediately acquire the lock and never has to wait because some other process has the lock.

Now run kalloctest. You should see quite a number of "!" on the console. kalloctest causes many processes to contend on the kmem.lock. This lock contention is a bit artificial, because qemu is simulating 3 processors, but it is likely on real hardware, there would be contention too.

Removing lock contention

The root cause of lock contention in kalloctest is that there is a single free list, protected by a single lock. To remove lock contention, you will have to redesign the memory allocator to avoid a single lock and list. The basic idea is to maintain a free list per CPU, each list with its own lock. Allocations and frees on each CPU can run in parallel, because each CPU will operate on a different list.

The main challenge will be to deal with the case that one CPU runs out of memory, but another CPU has still free memory; in that case, the one CPU must "steal" part of the other CPU's free list. Stealing may introduce lock contention, but that may be acceptable because it may happen infrequently.

Your job is to implement per-CPU freelists and stealing when one CPU is out of memory. Run kalloctest() to see if your implementation has removed lock contention.

Some hints:

You can use the constant NCPU in kernel/param.h
Let freerange give all free memory to the CPU running freerange.
The function cpuid returns the current core, but note that you can use it when interrupts are turned off and so you will need to turn on/off interrupts in your solution.

Run usertests to see if you don't break anything.

More scalabale bcache lookup

Several processes reading different files repeatedly will bottleneck in the buffer cache, bcache, in bio.c. Replace the acquire in bget with

    while(!tryacquire(&bcache.lock)) {
      printf("!");
    }

and run test0 from bcachetest and you will see "!"s.

Modify bget so that a lookup for a buffer that is in the bcache doesn't need to acquire bcache.lock. This is more tricky than the kalloc assignment, because bcache buffers are truly shared among processes. You must maintain the invariant that a buffer is only once in memory.

There are several races that bcache.lock protects against, including:

A brelse may set b->ref to 0, while concurrent bget is incrementing it.
Two bget may see b->ref = 0 and one may re-use the buffer, while the other may replaces it with another block.
A concurrent brelse modifies the list that bget traverses.

A challenge is testing whether you code is still correct. One way to do is to artificially delay certain operations using sleepticks. test1 trashes the buffer cache and exercises more code paths.

Here are some hints:

Read the description of buffer cache in the xv6 book (Section 7.2).
Use a simple design: i.e., don't design a lock-free implementation.
Use a simple hash table with locks per bucket.
Searching in hash table for a buffer and allocating an entry for that buffer when the buffer is not found must be atomic.
It is fine to acquire bcache.lock in brelse to update the LRU/MRU list.

Check that your implementation has less contention on test0

Make sure your implementation passes bcachetest and usertests.

Optional:

make the buffer cache more scalable (e.g., avoid taking out bcache.lock on brelse).
make lookup lock-free (Hint: use gcc's __sync_* functions.) How do you convince yourself that your implementation is correct?