labs/fs.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249

q<html>
<head>
<title>Lab: file system</title>
<link rel="stylesheet" href="homework.css" type="text/css" />
</head>
<body>

<h1>Lab: file system</h1>

<p>In this lab you will add large files and <tt>mmap</tt> to the xv6 file system.

<h2>Large files</h2>

<p>In this assignment you'll increase the maximum size of an xv6
file. Currently xv6 files are limited to 268 blocks, or 268*BSIZE
bytes (BSIZE is 1024 in xv6). This limit comes from the fact that an
xv6 inode contains 12 "direct" block numbers and one "singly-indirect"
block number, which refers to a block that holds up to 256 more block
numbers, for a total of 12+256=268. You'll change the xv6 file system
code to support a "doubly-indirect" block in each inode, containing
256 addresses of singly-indirect blocks, each of which can contain up
to 256 addresses of data blocks. The result will be that a file will
be able to consist of up to 256*256+256+11 blocks (11 instead of 12,
because we will sacrifice one of the direct block numbers for the
double-indirect block).

<h3>Preliminaries</h3>
  
<p>Modify your Makefile's <tt>CPUS</tt> definition so that it reads:
<pre>
CPUS := 1
</pre>

<b>XXX doesn't seem to speedup things</b>
<p>Add
<pre>
QEMUEXTRA = -snapshot
</pre>
right before
<tt>QEMUOPTS</tt>
<p>
The above two steps speed up qemu tremendously when xv6
creates large files.

<p><tt>mkfs</tt> initializes the file system to have fewer
than 1000 free data blocks, too few to show off the changes
you'll make. Modify <tt>param.h</tt> to 
set <tt>FSSIZE</tt> to:
<pre>
    #define FSSIZE       20000  // size of file system in blocks
</pre>

<p>Download <a href="big.c">big.c</a> into your xv6 directory,
add it to the UPROGS list, start up xv6, and run <tt>big</tt>.
It creates as big a file as xv6 will let
it, and reports the resulting size. It should say 140 sectors.

<h3>What to Look At</h3>

The format of an on-disk inode is defined by <tt>struct dinode</tt>
in <tt>fs.h</tt>. You're particularly interested in <tt>NDIRECT</tt>,
<tt>NINDIRECT</tt>, <tt>MAXFILE</tt>, and the <tt>addrs[]</tt> element
of <tt>struct dinode</tt>. Look Figure 7.3 in the xv6 text for a
diagram of the standard xv6 inode.

<p>
The code that finds a file's data on disk is in <tt>bmap()</tt>
in <tt>fs.c</tt>. Have a look at it and make sure you understand
what it's doing. <tt>bmap()</tt> is called both when reading and
writing a file. When writing, <tt>bmap()</tt> allocates new
blocks as needed to hold file content, as well as allocating
an indirect block if needed to hold block addresses.

<p>
<tt>bmap()</tt> deals with two kinds of block numbers. The <tt>bn</tt>
argument is a "logical block" -- a block number relative to the start
of the file. The block numbers in <tt>ip->addrs[]</tt>, and the
argument to <tt>bread()</tt>, are disk block numbers.
You can view <tt>bmap()</tt> as mapping a file's logical
block numbers into disk block numbers.

<h3>Your Job</h3>

Modify <tt>bmap()</tt> so that it implements a doubly-indirect
block, in addition to direct blocks and a singly-indirect block.
You'll have to have only 11 direct blocks, rather than 12,
to make room for your new doubly-indirect block; you're
not allowed to change the size of an on-disk inode.
The first 11 elements of <tt>ip->addrs[]</tt> should be
direct blocks; the 12th should be a singly-indirect block
(just like the current one); the 13th should be your new
doubly-indirect block.

<p>
You don't have to modify xv6 to handle deletion of files with
doubly-indirect blocks.

<p>
If all goes well, <tt>big</tt> will now report that it
can write  sectors. It will take <tt>big</tt> minutes
to finish.

<b>XXX this runs for a while!</b>

<h3>Hints</h3>

<p>
Make sure you understand <tt>bmap()</tt>. Write out a diagram of the
relationships between <tt>ip->addrs[]</tt>, the indirect block, the
doubly-indirect block and the singly-indirect blocks it points to, and
data blocks. Make sure you understand why adding a doubly-indirect
block increases the maximum file size by 256*256 blocks (really -1),
since you have to decrease the number of direct blocks by one).

<p>
Think about how you'll index the doubly-indirect block, and
the indirect blocks it points to, with the logical block
number.

<p>If you change the definition of <tt>NDIRECT</tt>, you'll
probably have to change the size of <tt>addrs[]</tt>
in <tt>struct inode</tt> in <tt>file.h</tt>. Make sure that
<tt>struct inode</tt> and <tt>struct dinode</tt> have the
same number of elements in their <tt>addrs[]</tt> arrays.

<p>If you change the definition of <tt>NDIRECT</tt>, make sure to create a
new <tt>fs.img</tt>, since <tt>mkfs</tt> uses <tt>NDIRECT</tt> too to build the
initial file systems.  If you delete <tt>fs.img</tt>, <tt>make</tt> on Unix (not
xv6) will build a new one for you.

<p>If your file system gets into a bad state, perhaps by crashing,
delete <tt>fs.img</tt> (do this from Unix, not xv6).  <tt>make</tt> will build a
new clean file system image for you.

<p>Don't forget to <tt>brelse()</tt> each block that you
<tt>bread()</tt>.

<p>You should allocate indirect blocks and doubly-indirect
blocks only as needed, like the original <tt>bmap()</tt>.

<h2>Memory-mapped files</h2>

<p>In this assignment you will implement the core of the systems
  calls <tt>mmap</tt> and <tt>munmap</tt>; see the man pages for an
  explanation what they do (run <tt>man 2 mmap</tt> in your terminal).
  The test program <tt>mmaptest</tt> tells you what should work.

<p>Here are some hints about how you might go about this assignment:

  <ul>
    <li>Start with adding the two systems calls to the kernel, as you
      done for other systems calls (e.g., <tt>sigalarm</tt>), but
      don't implement them yet; just return an
      error. run <tt>mmaptest</tt> to observe the error.
      
    <li>Keep track for each process what <tt>mmap</tt> has mapped.
      You will need to allocate a <tt>struct vma</tt> to record the
      address, length, permissions, etc. for each virtual memory area
      (VMA) that maps a file.  Since the xv6 kernel doesn't have a
      memory allocator in the kernel, you can use the same approach has
      for <tt>struct file</tt>: have a global array of <tt>struct
	vma</tt>s and have for each process a fixed-sized array of VMAs
      (like the file descriptor array).

    <li>Implement <tt>mmap</tt>: allocate a VMA, add it to the process's
      table of VMAs, fill in the VMA, and find a hole in the process's
      address space where you will map the file.  You can assume that no
      file will be bigger than 1GB.  The VMA will contain a pointer to
      a <tt>struct file</tt> for the file being mapped; you will need to
      increase the file's reference count so that the structure doesn't
      disappear when the file is closed (hint:
      see <tt>filedup</tt>). You don't have worry about overlapping
      VMAs.  Run <tt>mmaptest</tt>: the first <tt>mmap</tt> should
      succeed, but the first access to the mmaped- memory will fail,
      because you haven't updated the page fault handler.

    <li>Modify the page-fault handler from the lazy-allocation and COW
      labs to call a VMA function that handles page faults in VMAs.
      This function allocates a page, reads a 4KB from the mmap-ed
      file into the page, and maps the page into the address space of
      the process.  To read the page, you can use <tt>readi</tt>,
      which allows you to specify an offset from where to read in the
      file (but you will have to lock/unlock the inode passed
      to <tt>readi</tt>).  Don't forget to set the permissions correctly
      on the page.  Run <tt>mmaptest</tt>; you should get to the
      first <tt>munmap</tt>.
      
    <li>Implement <tt>munmap</tt>: find the <tt>struct vma</tt> for
      the address and unmap the specified pages (hint:
      use <tt>uvmunmap</tt>). If <tt>munmap</tt> removes all pages
      from a VMA, you will have to free the VMA (don't forget to
      decrement the reference count of the VMA's <tt>struct
      file</tt>); otherwise, you may have to shrink the VMA.  You can
      assume that <tt>munmap</tt> will not split a VMA into two VMAs;
      that is, we don't unmap a few pages in the middle of a VMA.  If
      an unmapped page has been modified and the file is
      mapped <tt>MAP_SHARED</tt>, you will have to write the page back
      to the file. RISC-V has a dirty bit (<tt>D</tt>) in a PTE to
      record whether a page has ever been written too; add the
      declaration to kernel/riscv.h and use it.  Modify <tt>exit</tt>
      to call <tt>munmap</tt> for the process's open VMAs.
      Run <tt>mmaptest</tt>; you should <tt>mmaptest</tt>, but
      probably not <tt>forktest</tt>.

    <li>Modify <tt>fork</tt> to copy VMAs from parent to child.  Don't
    forget to increment reference count for a VMA's <tt>struct
    file</tt>.  In the page fault handler of the child, it is OK to
    allocate a new page instead of sharing the page with the
    parent. The latter would be cooler, but it would require more
    implementation work.  Run <tt>mmaptest</tt>; make sure you pass
    both <tt>mmaptest</tt> and <tt>forktest</tt>.
          
  </ul>
  
<p>Run usertests to make sure you didn't break anything.

<p>Optional challenges:
  <ul>
    
    <li>If two processes have the same file mmap-ed (as
      in <tt>forktest</tt>), share their physical pages. You will need
      reference counts on physical pages.

    <li>The solution above allocates a new physical page for each page
    read from the mmap-ed file, even though the data is also in kernel
    memory in the buffer cache.  Modify your implementation to mmap
    that memory, instead of allocating a new page.  This requires that
    file blocks be the same size as pages (set <tt>BSIZE</tt> to
    4096).  You will need to pin mmap-ed blocks into the buffer cache.
    You will need worry about reference counts.

    <li>Remove redundancy between your implementation for lazy
    allocation and your implementation of mmapp-ed files.  (Hint:
    create an VMA for the lazy allocation area.)

    <li>Modify <tt>exec</tt> to use a VMA for different sections of
    the binary so that you get on-demand-paged executables. This will
    make starting programs faster, because <tt>exec</tt> will not have
      to read any data from the file system.

    <li>Implement on-demand paging: don't keep a process in memory,
    but let the kernel move some parts of processes to disk when
    physical memory is low.  Then, page in the paged-out memory when
    the process references it.
      
  </ul>
  
</body>
</html>