From f53494c28e362fb7752bbc83417b9ba47cff0bf5 Mon Sep 17 00:00:00 2001 From: rsc Date: Wed, 3 Sep 2008 04:50:04 +0000 Subject: DO NOT MAIL: xv6 web pages --- web/l-fs.html | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 web/l-fs.html (limited to 'web/l-fs.html') diff --git a/web/l-fs.html b/web/l-fs.html new file mode 100644 index 0000000..ed911fc --- /dev/null +++ b/web/l-fs.html @@ -0,0 +1,222 @@ +L10 + + + + + +

File systems

+ +

Required reading: iread, iwrite, and wdir, and code related to + these calls in fs.c, bio.c, ide.c, file.c, and sysfile.c + +

Overview

+ +

The next 3 lectures are about file systems: +

+ +

Users desire to store their data durable so that data survives when +the user turns of his computer. The primary media for doing so are: +magnetic disks, flash memory, and tapes. We focus on magnetic disks +(e.g., through the IDE interface in xv6). + +

To allow users to remember where they stored a file, they can +assign a symbolic name to a file, which appears in a directory. + +

The data in a file can be organized in a structured way or not. +The structured variant is often called a database. UNIX uses the +unstructured variant: files are streams of bytes. Any particular +structure is likely to be useful to only a small class of +applications, and other applications will have to work hard to fit +their data into one of the pre-defined structures. Besides, if you +want structure, you can easily write a user-mode library program that +imposes that format on any file. The end-to-end argument in action. +(Databases have special requirements and support an important class of +applications, and thus have a specialized plan.) + +

The API for a minimal file system consists of: open, read, write, +seek, close, and stat. Dup duplicates a file descriptor. For example: +

+  fd = open("x", O_RDWR);
+  read (fd, buf, 100);
+  write (fd, buf, 512);
+  close (fd)
+
+ +

Maintaining the file offset behind the read/write interface is an + interesting design decision . The alternative is that the state of a + read operation should be maintained by the process doing the reading + (i.e., that the pointer should be passed as an argument to read). + This argument is compelling in view of the UNIX fork() semantics, + which clones a process which shares the file descriptors of its + parent. A read by the parent of a shared file descriptor (e.g., + stdin, changes the read pointer seen by the child). On the other + hand the alternative would make it difficult to get "(data; ls) > x" + right. + +

Unix API doesn't specify that the effects of write are immediately + on the disk before a write returns. It is up to the implementation + of the file system within certain bounds. Choices include (that + aren't non-exclusive): +

+ +

A design issue is the semantics of a file system operation that + requires multiple disk writes. In particular, what happens if the + logical update requires writing multiple disks blocks and the power + fails during the update? For example, to create a new file, + requires allocating an inode (which requires updating the list of + free inodes on disk), writing a directory entry to record the + allocated i-node under the name of the new file (which may require + allocating a new block and updating the directory inode). If the + power fails during the operation, the list of free inodes and blocks + may be inconsistent with the blocks and inodes in use. Again this is + up to implementation of the file system to keep on disk data + structures consistent: +

+ +

Another design issue is the semantics are of concurrent writes to +the same data item. What is the order of two updates that happen at +the same time? For example, two processes open the same file and write +to it. Modern Unix operating systems allow the application to lock a +file to get exclusive access. If file locking is not used and if the +file descriptor is shared, then the bytes of the two writes will get +into the file in some order (this happens often for log files). If +the file descriptor is not shared, the end result is not defined. For +example, one write may overwrite the other one (e.g., if they are +writing to the same part of the file.) + +

An implementation issue is performance, because writing to magnetic +disk is relatively expensive compared to computing. Three primary ways +to improve performance are: careful file system layout that induces +few seeks, an in-memory cache of frequently-accessed blocks, and +overlap I/O with computation so that file operations don't have to +wait until their completion and so that that the disk driver has more +data to write, which allows disk scheduling. (We will talk about +performance in detail later.) + +

xv6 code examples

+ +

xv6 implements a minimal Unix file system interface. xv6 doesn't +pay attention to file system layout. It overlaps computation and I/O, +but doesn't do any disk scheduling. Its cache is write-through, which +simplifies keep on disk datastructures consistent, but is bad for +performance. + +

On disk files are represented by an inode (struct dinode in fs.h), +and blocks. Small files have up to 12 block addresses in their inode; +large files use files the last address in the inode as a disk address +for a block with 128 disk addresses (512/4). The size of a file is +thus limited to 12 * 512 + 128*512 bytes. What would you change to +support larger files? (Ans: e.g., double indirect blocks.) + +

Directories are files with a bit of structure to them. The file +contains of records of the type struct dirent. The entry contains the +name for a file (or directory) and its corresponding inode number. +How many files can appear in a directory? + +

In memory files are represented by struct inode in fsvar.h. What is +the role of the additional fields in struct inode? + +

What is xv6's disk layout? How does xv6 keep track of free blocks + and inodes? See balloc()/bfree() and ialloc()/ifree(). Is this + layout a good one for performance? What are other options? + +

Let's assume that an application created an empty file x with + contains 512 bytes, and that the application now calls read(fd, buf, + 100), that is, it is requesting to read 100 bytes into buf. + Furthermore, let's assume that the inode for x is is i. Let's pick + up what happens by investigating readi(), line 4483. +

+ +

Now let's suppose that the process is writing 512 bytes at the end + of the file a. How many disk writes will happen? +

+ +

Lots of code to implement reading and writing of files. How about + directories? +

+

Reading and writing of directories is trivial. + + -- cgit v1.2.3