web/l4.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518

<title>L4</title>
<html>
<head>
</head>
<body>

<h1>Address translation and sharing using segments</h1>

<p>This lecture is about virtual memory, focusing on address
spaces. It is the first lecture out of series of lectures that uses
xv6 as a case study.

<h2>Address spaces</h2>

<ul>

<li>OS: kernel program and user-level programs. For fault isolation
each program runs in a separate address space. The kernel address
spaces is like user address spaces, expect it runs in kernel mode.
The program in kernel mode can execute priviledge instructions (e.g.,
writing the kernel's code segment registers).

<li>One job of kernel is to manage address spaces (creating, growing,
deleting, and switching between them)

<ul>

<li>Each address space (including kernel) consists of the binary
    representation for the text of the program, the data part
    part of the program, and the stack area.

<li>The kernel address space runs the kernel program. In a monolithic
    organization the kernel manages all hardware and provides an API
    to user programs.

<li>Each user address space contains a program.  A user progam may ask
  to shrink or grow its address space.

</ul>

<li>The main operations:
<ul>
<li>Creation.  Allocate physical memory to storage program. Load
program into physical memory. Fill address spaces with references to
physical memory. 
<li>Growing. Allocate physical memory and add it to address space.
<li>Shrinking. Free some of the memory in an address space.
<li>Deletion. Free all memory in an address space.
<li>Switching. Switch the processor to use another address space.
<li>Sharing.  Share a part of an address space with another program.
</ul>
</ul>

<p>Two main approaches to implementing address spaces: using segments
  and using page tables. Often when one uses segments, one also uses
  page tables.  But not the other way around; i.e., paging without
  segmentation is common.

<h2>Example support for address spaces: x86</h2>

<p>For an operating system to provide address spaces and address
translation typically requires support from hardware.  The translation
and checking of permissions typically must happen on each address used
by a program, and it would be too slow to check that in software (if
even possible).  The division of labor is operating system manages
address spaces, and hardware translates addresses and checks
permissions.

<p>PC block diagram without virtual memory support:
<ul>
<li>physical address
<li>base, IO hole, extended memory
<li>Physical address == what is on CPU's address pins
</ul>

<p>The x86 starts out in real mode and translation is as follows:
	<ul>
	<li>segment*16+offset ==> physical address
        <li>no protection: program can load anything into seg reg
	</ul>

<p>The operating system can switch the x86 to protected mode, which
allows the operating system to create address spaces. Translation in
protected mode is as follows:
	<ul>
	<li>selector:offset (logical addr) <br>
	     ==SEGMENTATION==> 
	<li>linear address <br>
	     ==PAGING ==>
	<li>physical address
	</ul>

<p>Next lecture covers paging; now we focus on segmentation. 

<p>Protected-mode segmentation works as follows:
<ul>
<li>protected-mode segments add 32-bit addresses and protection
<ul>
<li>wait: what's the point? the point of segments in real mode was
  bigger addresses, but 32-bit mode fixes that!
</ul>
<li>segment register holds segment selector
<li>selector indexes into global descriptor table (GDT)
<li>segment descriptor holds 32-bit base, limit, type, protection
<li>la = va + base ; assert(va < limit);
<li>seg register usually implicit in instruction
	<ul>
	<li>DS:REG
		<ul>
		<li><tt>movl $0x1, _flag</tt>
		</ul>
	<li>SS:ESP, SS:EBP
		<ul>
		<li><tt>pushl %ecx, pushl $_i</tt>
		<li><tt>popl %ecx</tt>
		<li><tt>movl 4(%ebp),%eax</tt>
		</ul>
	<li>CS:EIP
		<ul>
		<li>instruction fetch
		</ul>
	<li>String instructions: read from DS:ESI, write to ES:EDI
		<ul>
		<li><tt>rep movsb</tt>
		</ul>
	<li>Exception: far addresses
		<ul>
		<li><tt>ljmp $selector, $offset</tt>
		</ul>
	</ul>
<li>LGDT instruction loads CPU's GDT register
<li>you turn on protected mode by setting PE bit in CR0 register
<li>what happens with the next instruction? CS now has different
  meaning...

<li>How to transfer from segment to another, perhaps with different
priveleges.
<ul>
<li>Current privilege level (CPL) is in the low 2 bits of CS
<li>CPL=0 is privileged O/S, CPL=3 is user
<li>Within in the same privelege level: ljmp.
<li>Transfer to a segment with more privilege: call gates.
<ul>
<li>a way for app to jump into a segment and acquire privs
<li>CPL must be <= descriptor's DPL in order to read or write segment
<li>call gates can change privelege <b>and</b> switch CS and SS
  segment
<li>call gates are implemented using a special type segment descriptor
  in the GDT.
<li>interrupts are conceptually the same as call gates, but their
  descriptor is stored in the IDT.  We will use interrupts to transfer
  control between user and kernel mode, both in JOS and xv6.  We will
  return to this in the lecture about interrupts and exceptions.
</ul>
</ul>

<li>What about protection?
<ul>
  <li>can o/s limit what memory an application can read or write?
  <li>app can load any selector into a seg reg...
  <li>but can only mention indices into GDT
  <li>app can't change GDT register (requires privilege)
  <li>why can't app write the descriptors in the GDT?
  <li>what about system calls? how to they transfer to kernel?
  <li>app cannot <b>just</b> lower the CPL
</ul>
</ul>

<h2>Case study (xv6)</h2>

<p>xv6 is a reimplementation of <a href="../v6.html">Unix 6th edition</a>.
<ul>
<li>v6 is a version of the orginal Unix operating system for <a href="http://www.pdp11.org/">DEC PDP11</a>
<ul>
       <li>PDP-11 (1972): 
	 <li>16-bit processor, 18-bit physical (40)
	 <li>UNIBUS
	 <li>memory-mapped I/O
	 <li>performance: less than 1MIPS
	 <li>register-to-register transfer: 0.9 usec
	 <li>56k-228k (40)
	 <li>no paging, but some segmentation support
	 <li>interrupts, traps
	 <li>about $10K
         <li>rk disk with 2MByte of storage
	 <li>with cabinet 11/40 is 400lbs
</ul>
       <li>Unix v6
<ul>
         <li><a href="../reference.html">Unix papers</a>.
         <li>1976; first widely available Unix outside Bell labs
         <li>Thompson and Ritchie
         <li>Influenced by Multics but simpler.
	 <li>complete (used for real work)
	 <li>Multi-user, time-sharing
	 <li>small (43 system calls)
	 <li>modular (composition through pipes; one had to split programs!!)
	 <li>compactly written (2 programmers, 9,000 lines of code)
	 <li>advanced UI (shell)
	 <li>introduced C (derived from B)
	 <li>distributed with source
	 <li>V7 was sold by Microsoft for a couple years under the name Xenix
</ul>
       <li>Lion's commentary
<ul>
         <li>surpressed because of copyright issue
	 <li>resurfaced in 1996
</ul>

<li>xv6 written for 6.828:
<ul>
         <li>v6 reimplementation for x86
	 <li>does't include all features of v6 (e.g., xv6 has 20 of 43
	 system calls).
	 <li>runs on symmetric multiprocessing PCs (SMPs).
</ul>
</ul>

<p>Newer Unixs have inherited many of the conceptual ideas even though
they added paging, networking, graphics, improve performance, etc.

<p>You will need to read most of the source code multiple times. Your
goal is to explain every line to yourself.

<h3>Overview of address spaces in xv6</h3>

<p>In today's lecture we see how xv6 creates the kernel address
 spaces, first user address spaces, and switches to it. To understand
 how this happens, we need to understand in detail the state on the
 stack too---this may be surprising, but a thread of control and
 address space are tightly bundled in xv6, in a concept
 called <i>process</i>.  The kernel address space is the only address
 space with multiple threads of control. We will study context
 switching and process management in detail next weeks; creation of
 the first user process (init) will get you a first flavor.

<p>xv6 uses only the segmentation hardware on xv6, but in a limited
  way. (In JOS you will use page-table hardware too, which we cover in
  next lecture.)  The adddress space layouts are as follows:
<ul>
<li>In kernel address space is set up as follows:
  <pre>
  the code segment runs from 0 to 2^32 and is mapped X and R
  the data segment runs from 0 to 2^32 but is mapped W (read and write).
  </pre>
<li>For each process, the layout is as follows: 
<pre>
  text
  original data and bss
  fixed-size stack
  expandable heap
</pre>
The text of a process is stored in its own segment and the rest in a
data segment.  
</ul>

<p>xv6 makes minimal use of the segmentation hardware available on the
x86. What other plans could you envision?

<p>In xv6, each each program has a user and a kernel stack; when the
user program switches to the kernel, it switches to its kernel stack.
Its kernel stack is stored in process's proc structure. (This is
arranged through the descriptors in the IDT, which is covered later.)

<p>xv6 assumes that there is a lot of physical memory. It assumes that
  segments can be stored contiguously in physical memory and has
  therefore no need for page tables.

<h3>xv6 kernel address space</h3>

<p>Let's see how xv6 creates the kernel address space by tracing xv6
  from when it boots, focussing on address space management:
<ul>
<li>Where does xv6 start after the PC is power on: start (which is
  loaded at physical address 0x7c00; see lab 1).
<li>1025-1033: are we in real mode?
<ul>
<li>how big are logical addresses?
<li>how big are physical addresses?
<li>how are addresses physical calculated?
<li>what segment is being used in subsequent code?
<li>what values are in that segment?
</ul>
<li>1068: what values are loaded in the GDT?
<ul>
<li>1097: gdtr points to gdt
<li>1094: entry 0 unused
<li>1095: entry 1 (X + R, base = 0, limit = 0xffffffff, DPL = 0)
<li>1096: entry 2 (W, base = 0, limit = 0xffffffff, DPL = 0)
<li>are we using segments in a sophisticated way? (i.e., controled sharing)
<li>are P and S set?
<li>are addresses translated as in protected mode when lgdt completes?
</ul>
<li>1071: no, and not even here.
<li>1075: far jump, load 8 in CS. from now on we use segment-based translation.
<li>1081-1086: set up other segment registers
<li>1087: where is the stack which is used for procedure calls?
<li>1087: cmain in the bootloader (see lab 1), which calls main0
<li>1222: main0.  
<ul>
<li>job of main0 is to set everthing up so that all xv6 convtions works
<li>where is the stack?  (sp = 0x7bec)
<li>what is on it?
<pre>
   00007bec [00007bec]  7cda  // return address in cmain
   00007bf0 [00007bf0]  0080  // callee-saved ebx
   00007bf4 [00007bf4]  7369  // callee-saved esi
   00007bf8 [00007bf8]  0000  // callee-saved ebp
   00007bfc [00007bfc]  7c49  // return address for cmain: spin
   00007c00 [00007c00]  c031fcfa  // the instructions from 7c00 (start)
</pre>
</ul>
<li>1239-1240: switch to cpu stack (important for scheduler)
<ul>
<li>why -32?
<li>what values are in ebp and esp?
<pre>
esp: 0x108d30   1084720
ebp: 0x108d5c   1084764
</pre>
<li>what is on the stack?
<pre>
   00108d30 [00108d30]  0000
   00108d34 [00108d34]  0000
   00108d38 [00108d38]  0000
   00108d3c [00108d3c]  0000
   00108d40 [00108d40]  0000
   00108d44 [00108d44]  0000
   00108d48 [00108d48]  0000
   00108d4c [00108d4c]  0000
   00108d50 [00108d50]  0000
   00108d54 [00108d54]  0000
   00108d58 [00108d58]  0000
   00108d5c [00108d5c]  0000
   00108d60 [00108d60]  0001
   00108d64 [00108d64]  0001
   00108d68 [00108d68]  0000
   00108d6c [00108d6c]  0000
</pre>

<li>what is 1 in 0x108d60?  is it on the stack?

</ul>

<li>1242: is it save to reference bcpu?  where is it allocated?

<li>1260-1270: set up proc[0]

<ul>
<li>each process has its own stack (see struct proc).

<li>where is its stack?  (see the section below on physical memory
  management below).

<li>what is the jmpbuf?  (will discuss in detail later)

<li>1267: why -4?

</ul>

<li>1270: necessar to be able to take interrupts (will discuss in
  detail later)

<li>1292: what process do you think scheduler() will run?  we will
  study later how that happens, but let's assume it runs process0 on
  process0's stack.
</ul>

<h3>xv6 user address spaces</h3>

<ul>
<li>1327: process0  
<ul>
<li>process 0 sets up everything to make process conventions work out

<li>which stack is process0 running?  see 1260.

<li>1334: is the convention to release the proc_table_lock after being
  scheduled? (we will discuss locks later; assume there are no other
  processors for now.)

<li>1336: cwd is current working directory.

<li>1348: first step in initializing a template tram frame: set
  everything to zero. we are setting up process 0 as if it just
  entered the kernel from user space and wants to go back to user
  space.  (see x86.h to see what field have the value 0.)

<li>1349: why "|3"?  instead of 0?

<li>1351: why set interrupt flag in template trapframe?

<li>1352: where will the user stack be in proc[0]'s address space?

<li>1353: makes a copy of proc0.  fork() calls copyproc() to implement
  forking a process.  This statement in essense is calling fork inside
  proc0, making a proc[1] a duplicate of proc[0].  proc[0], however,
  has not much in its address space of one page (see 1341).
<ul>
<li>2221: grab a lock on the proc table so that we are the only one
  updating it.
<li>2116: allocate next pid.
<li>2228: we got our entry; release the  lock. from now we are only
  modifying our entry.
<li>2120-2127: copy proc[0]'s memory.  proc[1]'s memory will be identical
  to proc[0]'s.
<li>2130-2136: allocate a kernel stack. this stack is different from
  the stack that proc[1] uses when running in user mode.
<li>2139-2140: copy the template trapframe that xv6 had set up in
  proc[0].
<li>2147: where will proc[1] start running when the scheduler selects
  it?
<li>2151-2155: Unix semantics: child inherits open file descriptors
  from parent.
<li>2158: same for cwd.
</ul>

<li>1356: load a program in proc[1]'s address space.  the program
  loaded is the binary version of init.c (sheet 16).

<li>1374: where will proc[1] start?

<li>1377-1388: copy the binary into proc[1]'s address space.  (you
  will learn about the ELF format in the labs.)  
<ul>
<li>can the binary for init be any size for proc[1] to work correctly?

<li>what is the layout of proc[1]'s address space? is it consistent
  with the layout described on line 1950-1954?

</ul>

<li>1357: make proc[1] runnable so that the scheduler will select it
  to run.  everything is set up now for proc[1] to run, "return" to
  user space, and execute init.

<li>1359: proc[0] gives up the processor, which calls sleep, which
  calls sched, which setjmps back to scheduler. let's peak a bit in
  scheduler to see what happens next.  (we will return to the
  scheduler in more detail later.)
</ul>
<li>2219: this test will fail for proc[1]
<li>2226: setupsegs(p) sets up the segments for proc[1].  this call is
  more interesting than the previous, so let's see what happens:
<ul>
<li>2032-37: this is for traps and interrupts, which we will cover later.
<li>2039-49: set up new gdt.
<li>2040: why 0x100000 + 64*1024?
<li>2045: why 3?  why is base p->mem? is p->mem physical or logical?
<li>2045-2046: how much the program for proc[1] be compiled if proc[1]
  will run successfully in user space?
<li>2052: we are still running in the kernel, but we are loading gdt.
  is this ok?
<li>why have so few user-level segments?  why not separate out code,
  data, stack, bss, etc.?
</ul>
<li>2227: record that proc[1] is running on the cpu
<li>2228: record it is running instead of just runnable
<li>2229: setjmp to fork_ret.
<li>2282: which stack is proc[1] running on?
<li>2284: when scheduled, first release the proc_table_lock.
<li>2287: back into assembly.
<li>2782: where is the stack pointer pointing to?
<pre>
   0020dfbc [0020dfbc]  0000
   0020dfc0 [0020dfc0]  0000
   0020dfc4 [0020dfc4]  0000
   0020dfc8 [0020dfc8]  0000
   0020dfcc [0020dfcc]  0000
   0020dfd0 [0020dfd0]  0000
   0020dfd4 [0020dfd4]  0000
   0020dfd8 [0020dfd8]  0000
   0020dfdc [0020dfdc]  0023
   0020dfe0 [0020dfe0]  0023
   0020dfe4 [0020dfe4]  0000
   0020dfe8 [0020dfe8]  0000
   0020dfec [0020dfec]  0000
   0020dff0 [0020dff0]  001b
   0020dff4 [0020dff4]  0200
   0020dff8 [0020dff8]  1000
</pre>
<li>2783: why jmp instead of call?
<li>what will iret put in eip?
<li>what is 0x1b?  what will iret put in cs?
<li>after iret, what will the processor being executing?
</ul>

<h3>Managing physical memory</h3>

<p>To create an address space we must allocate physical memory, which
  will be freed when an address space is deleted (e.g., when a user
  program terminates).  xv6 implements a first-fit memory allocater
  (see kalloc.c).  

<p>It maintains a list of ranges of free memory.  The allocator finds
  the first range that is larger than the amount of requested memory.
  It splits that range in two: one range of the size requested and one
  of the remainder.  It returns the first range.  When memory is
  freed, kfree will merge ranges that are adjacent in memory.

<p>Under what scenarios is a first-fit memory allocator undesirable?

<h3>Growing an address space</h3>

<p>How can a user process grow its address space?  growproc.
<ul>
<li>2064: allocate a new segment of old size plus n
<li>2067: copy the old segment into the new (ouch!)
<li>2068: and zero the rest.
<li>2071: free the old physical memory
</ul>
<p>We could do a lot better if segments didn't have to contiguous in
  physical memory. How could we arrange that? Using page tables, which
  is our next topic.  This is one place where page tables would be
  useful, but there are others too (e.g., in fork).
</body>