1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
|
<html>
<head>
<title>Lab: locks</title>
<link rel="stylesheet" href="homework.css" type="text/css" />
</head>
<body>
<h1>Lab: locks</h1>
<p>In this lab you will try to avoid lock contention for certain
workloads.
<h2>lock contention</h2>
<p>The program user/kalloctest stresses xv6's memory allocator: three
processes grow and shrink there address space, which will results in
many calls to <tt>kalloc</tt> and <tt>kfree</tt>,
respectively. <tt>kalloc</tt> and <tt>kfree</tt>
obtain <tt>kmem.lock</tt>. To see if there is lock contention for
<tt>kmem.lock</tt> replace the call to <tt>acquire</tt>
in <tt>kalloc</tt> with the following code:
<pre>
while(!tryacquire(&kmem.lock)) {
printf("!");
}
</pre>
<p><tt>tryacquire</tt> tries to acquire <tt>kmem.lock</tt>: if the
lock is taking it returns false (0); otherwise, it returns true (1)
and with the lock acquired. Your first job is to
implement <tt>tryacquire</tt> in kernel/spinlock.c.
<p>A few hints:
<ul>
<li>look at <tt>acquire</tt>.
<li>don't forget to restore interrupts when acquision fails
<li>Add tryacquire's signature to defs.h.
</ul>
<p>Run usertests to see if you didn't break anything. Note that
usertests never prints "!"; there is never contention
for <tt>kmem.lock</tt>. The caller is always able to immediately
acquire the lock and never has to wait because some other process
has the lock.
<p>Now run kalloctest. You should see quite a number of "!" on the
console. kalloctest causes many processes to contend on
the <tt>kmem.lock</tt>. This lock contention is a bit artificial,
because qemu is simulating 3 processors, but it is likely on real
hardware, there would be contention too.
<h2>Removing lock contention</h2>
<p>The root cause of lock contention in kalloctest is that there is a
single free list, protected by a single lock. To remove lock
contention, you will have to redesign the memory allocator to avoid
a single lock and list. The basic idea is to maintain a free list
per CPU, each list with its own lock. Allocations and frees on each
CPU can run in parallel, because each CPU will operate on a
different list.
<p> The main challenge will be to deal with the case that one CPU runs
out of memory, but another CPU has still free memory; in that case,
the one CPU must "steal" part of the other CPU's free list.
Stealing may introduce lock contention, but that may be acceptable
because it may happen infrequently.
<p>Your job is to implement per-CPU freelists and stealing when one
CPU is out of memory. Run kalloctest() to see if your
implementation has removed lock contention.
<p>Some hints:
<ul>
<li>You can use the constant <tt>NCPU</tt> in kernel/param.h
<li>Let <tt>freerange</tt> give all free memory to the CPU
running <tt>freerange</tt>.
<li>The function <tt>cpuid</tt> returns the current core, but note
that you can use it when interrupts are turned off and so you will
need to turn on/off interrupts in your solution.
</ul>
<p>Run usertests to see if you don't break anything.
<h2>More scalabale bcache lookup</h2>
<p>Several processes reading different files repeatedly will
bottleneck in the buffer cache, bcache, in bio.c. Replace the
acquire in <tt>bget</tt> with
<pre>
while(!tryacquire(&bcache.lock)) {
printf("!");
}
</pre>
and run test0 from bcachetest and you will see "!"s.
<p>Modify <tt>bget</tt> so that a lookup for a buffer that is in the
bcache doesn't need to acquire <tt>bcache.lock</tt>. This is more
tricky than the kalloc assignment, because bcache buffers are truly
shared among processes. You must maintain the invariant that a
buffer is only once in memory.
<p> There are several races that <tt>bcache.lock</tt> protects
against, including:
<ul>
<li>A <tt>brelse</tt> may set <tt>b->ref</tt> to 0,
while concurrent <tt>bget</tt> is incrementing it.
<li>Two <tt>bget</tt> may see <tt>b->ref = 0</tt> and one may re-use
the buffer, while the other may replaces it with another block.
<li>A concurrent <tt>brelse</tt> modifies the list
that <tt>bget</tt> traverses.
</ul>
<p>A challenge is testing whether you code is still correct. One way
to do is to artificially delay certain operations
using <tt>sleepticks</tt>. <tt>test1</tt> trashes the buffer cache
and exercises more code paths.
<p>Here are some hints:
<ul>
<li>Read the description of buffer cache in the xv6 book (Section 7.2).
<li>Use a simple design: i.e., don't design a lock-free implementation.
<li>Use a simple hash table with locks per bucket.
<li>Searching in hash table for a buffer and allocating an entry
for that buffer when the buffer is not found must be atomic.
<li>It is fine to acquire <tt>bcache.lock</tt> in <tt>brelse</tt>
to update the LRU/MRU list.
</ul>
<p>Check that your implementation has less contention
on <tt>test0</tt>
<p>Make sure your implementation passes bcachetest and usertests.
<p>Optional:
<ul>
<li>make the buffer cache more scalable (e.g., avoid taking
out <tt>bcache.lock</tt> on <tt>brelse</tt>).
<li>make lookup lock-free (Hint: use gcc's <tt>__sync_*</tt>
functions.) How do you convince yourself that your implementation is correct?
</ul>
</body>
</html>
|