From 01a6c054d548d9fff8bbdfac4d3f3de4ae8677a1 Mon Sep 17 00:00:00 2001 From: Austin Clements Date: Wed, 7 Sep 2011 11:49:14 -0400 Subject: Remove web directory; all cruft or moved to 6.828 repo --- web/l13.html | 245 ----------------------------------------------------------- 1 file changed, 245 deletions(-) delete mode 100644 web/l13.html (limited to 'web/l13.html') diff --git a/web/l13.html b/web/l13.html deleted file mode 100644 index af0f405..0000000 --- a/web/l13.html +++ /dev/null @@ -1,245 +0,0 @@ -High-performance File Systems - - - - - -

High-performance File Systems

- -

Required reading: soft updates. - -

Overview

- -

A key problem in designing file systems is how to obtain -performance on file system operations while providing consistency. -With consistency, we mean, that file system invariants are maintained -is on disk. These invariants include that if a file is created, it -appears in its directory, etc. If the file system data structures are -consistent, then it is possible to rebuild the file system to a -correct state after a failure. - -

To ensure consistency of on-disk file system data structures, - modifications to the file system must respect certain rules: -

-The paper calls these dependencies update dependencies. - -

xv6 ensures these rules by writing every block synchronously, and - by ordering the writes appropriately. With synchronous, we mean - that a process waits until the current disk write has been - completed before continuing with execution. - -

- -

xv6 is a nice example of the tension between consistency and - performance. To get consistency, xv6 uses synchronous writes, - but these writes are slow, because they perform at the rate of a - seek instead of the rate of the maximum data transfer rate. The - bandwidth to a disk is reasonable high for large transfer (around - 50Mbyte/s), but latency is low, because of the cost of moving the - disk arm(s) (the seek latency is about 10msec). - -

This tension is an implementation-dependent one. The Unix API - doesn't require that writes are synchronous. Updates don't have to - appear on disk until a sync, fsync, or open with O_SYNC. Thus, in - principle, the UNIX API allows delayed writes, which are good for - performance: -

- -

Thus the question: how to delay writes and achieve consistency? - The paper provides an answer. - -

This paper

- -

The paper surveys some of the existing techniques and introduces a -new to achieve the goal of performance and consistency. - -

- -

Techniques possible: -

- -

Soft updates is the solution explored in this paper. It doesn't -require NVRAM, and performs as well as the naive strategy of keep all -dirty block in main memory. Compared to logging, it is unclear if -soft updates is better. The default BSD file systems uses soft - updates, but most Linux file systems use logging. - -

Soft updates is a sophisticated variant of flusher-enforced -ordering. Instead of maintaining dependencies on the block-level, it -maintains dependencies on file structure level (per inode, per -directory, etc.), reducing circular dependencies. Furthermore, it -breaks any remaining circular dependencies by undo changes before -writing the block and then redoing them to the block after writing. - -

Pseudocode for create: -

-create (f) {
-   allocate inode in block i  (assuming inode is available)
-   add i to directory data block d  (assuming d has space)
-   mark d has dependent on i, and create undo/redo record
-   update directory inode in block di
-   mark di has dependent on d
-}
-
- -

Pseudocode for the flusher: -

-flushblock (b)
-{
-  lock b;
-  for all dependencies that b is relying on
-    "remove" that dependency by undoing the change to b
-    mark the dependency as "unrolled"
-  write b 
-}
-
-write_completed (b) {
-  remove dependencies that depend on b
-  reapply "unrolled" dependencies that b depended on
-  unlock b
-}
-
- -

Apply flush algorithm to example: -

- -

An file operation that is important for file-system consistency -is rename. Rename conceptually works as follows: -

-rename (from, to)
-   unlink (to);
-   link (from, to);
-   unlink (from);
-
- -

Rename it often used by programs to make a new version of a file -the current version. Committing to a new version must happen -atomically. Unfortunately, without a transaction-like support -atomicity is impossible to guarantee, so a typical file systems -provides weaker semantics for rename: if to already exists, an -instance of to will always exist, even if the system should crash in -the middle of the operation. Does the above implementation of rename -guarantee this semantics? (Answer: no). - -

If rename is implemented as unlink, link, unlink, then it is -difficult to guarantee even the weak semantics. Modern UNIXes provide -rename as a file system call: -

-   update dir block for to point to from's inode // write block
-   update dir block for from to free entry // write block
-
-

fsck may need to correct refcounts in the inode if the file -system fails during rename. for example, a crash after the first -write followed by fsck should set refcount to 2, since both from -and to are pointing at the inode. - -

This semantics is sufficient, however, for an application to ensure -atomicity. Before the call, there is a from and perhaps a to. If the -call is successful, following the call there is only a to. If there -is a crash, there may be both a from and a to, in which case the -caller knows the previous attempt failed, and must retry. The -subtlety is that if you now follow the two links, the "to" name may -link to either the old file or the new file. If it links to the new -file, that means that there was a crash and you just detected that the -rename operation was composite. On the other hand, the retry -procedure can be the same for either case (do the rename again), so it -isn't necessary to discover how it failed. The function follows the -golden rule of recoverability, and it is idempotent, so it lays all -the needed groundwork for use as part of a true atomic action. - -

With soft updates renames becomes: -

-rename (from, to) {
-   i = namei(from);
-   add "to" directory data block td a reference to inode i
-   mark td dependent on block i
-   update directory inode "to" tdi
-   mark tdi as dependent on td
-   remove "from" directory data block fd a reference to inode i
-   mark fd as dependent on tdi
-   update directory inode in block fdi
-   mark fdi as dependent on fd
-}
-
-

No synchronous writes! - -

What needs to be done on recovery? (Inspect every statement in -rename and see what inconsistencies could exist on the disk; e.g., -refcnt inode could be too high.) None of these inconsitencies require -fixing before the file system can operate; they can be fixed by a -background file system repairer. - -

Paper discussion

- -

Do soft updates perform any useless writes? (A useless write is a -write that will be immediately overwritten.) (Answer: yes.) Fix -syncer to becareful with what block to start. Fix cache replacement -to selecting LRU block with no pendending dependencies. - -

Can a log-structured file system implement rename better? (Answer: -yes, since it can get the refcnts right). - -

Discuss all graphs. - - - -- cgit v1.2.3