1 .\" $NetBSD: beyond43.ms,v 1.4 2003/08/07 10:30:41 agc Exp $
3 .\" Copyright (c) 1989 The Regents of the University of California.
4 .\" All rights reserved.
6 .\" Redistribution and use in source and binary forms, with or without
7 .\" modification, are permitted provided that the following conditions
9 .\" 1. Redistributions of source code must retain the above copyright
10 .\" notice, this list of conditions and the following disclaimer.
11 .\" 2. Redistributions in binary form must reproduce the above copyright
12 .\" notice, this list of conditions and the following disclaimer in the
13 .\" documentation and/or other materials provided with the distribution.
14 .\" 3. Neither the name of the University nor the names of its contributors
15 .\" may be used to endorse or promote products derived from this software
16 .\" without specific prior written permission.
18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30 .\" @(#)beyond43.ms 5.1 (Berkeley) 6/5/90
36 \fB\s+2Current Research by
37 The Computer Systems Research Group
39 .ds DT "February 10, 1989
40 .\" \fBDRAFT of \*(DT\fP
43 Marshall Kirk McKusick
55 The release of 4.3BSD in April of 1986 addressed many of the
56 performance problems and unfinished interfaces
57 present in 4.2BSD [Leffler84] [McKusick85].
58 The Computer Systems Research Group at Berkeley
59 has now embarked on a new development phase to
60 update other major components of the system, as well as to offer
62 There are five major ongoing projects.
63 The first is to develop an OSI network protocol suite and to integrate
64 existing ISO applications into Berkeley UNIX.
65 The second is to develop and support an interface compliant with the
66 P1003.1 POSIX standard recently approved by the IEEE.
67 The third is to refine the TCP/IP networking to improve
68 its performance and limit congestion on slow and/or lossy networks.
69 The fourth is to provide a standard interface to file systems
70 so that multiple local and remote file systems can be supported,
71 much as multiple networking protocols are supported by 4.3BSD.
72 The fifth is to evaluate alternate access control mechanisms and
73 audit the existing security features of the system, particularly
74 with respect to network services.
75 Other areas of work include multi-architecture support,
76 a general purpose kernel memory allocator, disk labels, and
77 extensions to the 4.2BSD fast filesystem.
79 We are planning to finish implementation prototypes for each of the
80 five main areas of work over the next year, and provide an informal
81 test release sometime next year for interested developers.
82 After incorporating feedback and refinements from the testers,
83 they will appear in the next full Berkeley release, which is typically
84 made about a year after the test release.
89 Recently Completed Projects
91 There have been several changes in the system that were included
92 in the recent 4.3BSD Tahoe release.
94 Multi-architecture support
96 Support has been added for the DEC VAX 8600/8650, VAX 8200/8250,
97 MicroVAXII and MicroVAXIII.
99 The largest change has been the incorporation of support for the first
100 non-VAX processor, the CCI Power 6/32 and 6/32SX. (This addition also
102 Harris HCX-7 and HCX-9, as well as the Sperry 7000/40 and ICL machines.)
103 The Power 6 version of 4.3BSD is largely based on the compilers and
104 device drivers done for CCI's 4.2BSD UNIX,
105 and is otherwise similar to the VAX release of 4.3BSD.
106 The entire source tree, including all kernel and user-level sources,
107 has been merged using a structure that will easily accommodate the addition
108 of other processor families. A MIPS R2000 has been donated to us,
109 making the MIPS architecture a likely candidate for inclusion into a future
112 Kernel Memory Allocator
114 The 4.3BSD UNIX kernel used 10 different memory allocation mechanisms,
115 each designed for the particular needs of the particular subsystem.
116 These mechanisms have been replaced by a general purpose dynamic
117 memory allocator that can be used by all of the kernel subsystems.
118 The design of this allocator takes advantage of known memory usage
119 patterns in the UNIX kernel and a hybrid strategy that is time-efficient
120 for small allocations and space-efficient for large allocations.
121 This allocator replaces the multiple memory allocation interfaces
122 with a single easy-to-program interface,
123 results in more efficient use of global memory by eliminating
124 partitioned and specialized memory pools,
125 and is quick enough (approximately 15 VAX instructions) that no
126 performance loss is observed relative to the current implementations.
131 During the work on the CCI machine,
132 it became obvious that disk geometry and filesystem layout information
133 must be stored on each disk in a pack label.
134 Disk labels were implemented for the CCI disks and for the most common
135 types of disk controllers on the VAX.
136 A utility was written to create and maintain the disk information,
137 and other user-level programs that use such information now obtain
138 it from the disk label.
139 The use of this facility has allowed improvements in the file system's
140 knowledge of irregular disk geometries such as track-to-track skew.
144 The 4.2 fast file sytem [McKusick84]
145 contained several statically sized structures,
146 imposing limits on the number of cylinders per cylinder group,
147 inodes per cylinder group,
148 and number of distinguished rotational positions.
149 The new ``fat'' filesystem allows these limits to be set at filesystem
151 Old kernels will treat the new filesystems as read-only,
153 will accommodate both formats.
154 The filesystem check facility, \fCfsck\fP, has also been modified to check
160 Current UNIX Research at Berkeley
162 Since the release of 4.3BSD in mid 1986,
163 we have begun work on several new major areas of research.
164 Our goal is to apply leading edge research ideas into a stable
165 and reliable implementation that solves current problems in
166 operating systems development.
168 OSI network protocol development
170 The network architecture of 4.2BSD was designed to accommodate
171 multiple network protocol families and address formats,
172 and an implementation of the ISO OSI network protocols
173 should enter into this framework without much difficulty.
175 implement the OSI connectionless internet protocol (CLNP),
176 and device drivers for X.25, 802.3, and possibly 802.5 interfaces, and
177 to integrate these with an OSI transport class 4 (TP-4) implementation.
178 We will also incorporate into the Berkeley Software Distribution an
179 updated ISO Development Environment (ISODE)
180 featuring International Standard (IS) versions of utilities.
181 ISODE implements the session and presentation layers of the OSI protocol suite,
182 and will include an implementation of the file transfer protocol (FTAM).
183 It is also possible that an X.400 implementation now being done at
184 University College, London and the University of Nottingham
185 will be available for testing and distribution.
187 This implementation is comprised of four areas.
189 We are updating the University of
190 Wisconsin TP-4 to match GOSIP requirements.
191 The University of Wisconsin developed a transport class 4
192 implementation for the 4.2BSD kernel under contract to Mitre.
193 This implementation must be updated to reflect the National Institute
194 of Standards and Technology (NIST, formerly NBS) workshop agreements,
195 GOSIP, and 4.3BSD requirements.
196 We will make this TP-4 operate with an OSI IP,
197 as the original implementation was built to run over the DoD IP.
199 A kernel version of the OSI IP and ES-IS protocols must be produced.
200 We will implement the kernel version of these protocols.
202 The required device drivers need to be integrated into a BSD kernel.
203 4.3BSD has existing device drivers for many ethernet devices; future
204 BSD versions may also support X.25 devices as well as token ring
206 These device drivers must be integrated
207 into the kernel OSI protocol implementations.
209 The existing OSINET interoperability test network is available so
210 that the interoperability of the ISODE and BSD kernel protocols
211 can be established through tests with several vendors.
212 Testing is crucial because an openly available version of GOSIP protocols
213 that does not interoperate with DEC, IBM, SUN, ICL, HIS, and other
214 major vendors would be embarrassing.
215 To allow testing of the integrated pieces the most desirable
216 approach is to provide access to OSINET at UCB.
217 A second approach is to do the interoperability testing at
218 the site of an existing OSINET member, such as the NBS.
220 Compliance with POSIX 1003
222 Berkeley became involved several months ago in the development
223 of the IEEE POSIX P1003.1 system interface standard.
224 Since then, we have been parcipating in the working groups
225 of P1003.2 (shell and application utility interface),
226 P1003.6 (security), P1003.7 (system administration), and P1003.8
229 The IEEE published the POSIX P1003.1 standard in late 1988.
230 POSIX related changes to the BSD system have included a new terminal
231 driver, support for POSIX sessions and job control, expanded signal
232 functionality, restructured directory access routines, and new set-user
233 and set-group id facilities.
234 We currently have a prototype implementation of the
235 POSIX driver with extensions to provide binary compatibility with
236 applications developed for the old Berkeley terminal driver.
237 We also have a prototype implementation of the 4.2BSD-based POSIX
238 job control facility.
240 The P1003.2 draft is currently being voted on by the IEEE
241 P1003.2 balloting group.
242 Berkeley is particularly interested in the results of this standard,
243 as it will profoundly influence the user environment.
244 The other groups are in comparatively early phases, with drafts
245 coming to ballot sometime in the 90's.
246 Berkeley will continue to participate in these groups, and
247 move in the near future toward a P1003.1 and P1003.2 compliant
249 We have many of the utilities outlined in the current P1003.2 draft
250 already implemented, and have other parties willing to contribute
251 additional implementations.
253 Improvements to the TCP/IP Networking Protocols
255 The Internet and the Berkeley collection of local-area networks
256 have both grown at high rates in the last year.
257 The Bay Area Regional Research Network (BARRNet),
258 connecting several UC campuses, Stanford and NASA-Ames
259 has recently become operational, increasing the complexity
260 of the network connectivity.
261 Both Internet and local routing algorithms are showing the strain
263 We have made several changes in the local routing algorithm
264 to keep accommodating the current topology,
265 and are participating in the development of new routing algorithms
266 and standard protocols.
268 Recent work in collaboration with Van Jacobson of the Lawrence Berkeley
269 Laboratory has led to the design and implementation of several new algorithms
270 for TCP that improve throughput on both local and long-haul networks
271 while reducing unnecessary retransmission.
272 The improvement is especially striking when connections must traverse
273 slow and/or lossy networks.
274 The new algorithms include ``slow-start,''
275 a technique for opening the TCP flow control window slowly
276 and using the returning stream of acknowledgements as a clock
277 to drive the connection at the highest speed tolerated by the intervening
279 A modification of this technique allows the sender to dynamically modify
280 the send window size to adjust to changing network conditions.
281 In addition, the round-trip timer has been modified to estimate the variance
282 in round-trip time, thus allowing earlier retransmission of lost packets
283 with less spurious retransmission due to increasing network delay.
284 Along with a scheme proposed by Phil Karn of Bellcore,
285 these changes reduce unnecessary retransmission over difficult paths
286 such as Satnet by nearly two orders of magnitude
287 while improving throughput dramatically.
289 The current TCP implementation is now being readied
290 for more widespread distribution via the network and as a
291 standard Berkeley distribution unencumbered by any commercial licensing.
292 We are continuing to refine the TCP and IP implementations
293 using the ARPANET, BARRNet, the NSF network
294 and local campus nets as testbeds.
295 In addition, we are incorporating applicable algorithms from this work
296 into the TP-4 protocol implementation.
298 Toward a Compatible File System Interface
300 The most critical shortcoming of the 4.3BSD UNIX system was in the
301 area of distributed file systems.
302 As with networking protocols,
303 there is no single distributed file system
304 that provides sufficient speed and functionality for all problems.
305 It is frequently necessary to support several different remote
306 file system protocols, just as it is necessary to run several
307 different network protocols.
309 As network or remote file systems have been implemented for UNIX,
310 several stylized interfaces between the file system implementation
311 and the rest of the kernel have been developed.
312 Among these are Sun Microsystems' Virtual File System interface (VFS)
313 using \fBvnodes\fP [Sandburg85] [Kleiman86],
314 Digital Equipment's Generic File System (GFS) architecture [Rodriguez86],
315 AT&T's File System Switch (FSS) [Rifkin86],
316 the LOCUS distributed file system [Walker85],
317 and Masscomp's extended file system [Cole85].
318 Other remote file systems have been implemented in research or
319 university groups for internal use,
320 notably the network file system in the Eighth Edition UNIX
321 system [Weinberger84] and two different file systems used at Carnegie Mellon
322 University [Satyanarayanan85].
323 Numerous other remote file access methods have been devised for use
324 within individual UNIX processes,
325 many of them by modifications to the C I/O library
326 similar to those in the Newcastle Connection [Brownbridge82].
328 Each design attempts to isolate file system-dependent details
329 below a generic interface and to provide a framework within which
330 new file systems may be incorporated.
331 However, each of these interfaces is different from
332 and incompatible with the others.
333 Each addresses somewhat different design goals,
334 having been based on a different version of UNIX,
335 having targeted a different set of file systems with varying characteristics,
336 and having selected a different set of file system primitive operations.
338 Our effort in this area is aimed at providing a common framework to
339 support these different distributed file systems simultaneously rather than to
340 simply implement yet another protocol.
341 This requires a detailed study of the existing protocols,
342 and discussion with their implementors to determine whether
343 they could modify their implementation to fit within our proposed
344 framework. We have studied the various file system interfaces to determine
345 their generality, completeness, robustness, efficiency, and aesthetics
346 and are currently working on a file system interface
347 that we believe includes the best features of
348 each of the existing implementations.
349 This work and the rationale underlying its development
350 have been presented to major software vendors as an early step
351 toward convergence on a standard compatible file system interface.
352 Briefly, the proposal adopts the 4.3BSD calling convention for file
353 name lookup but otherwise is closely related to Sun's VFS
354 and DEC's GFS. [Karels86].
358 The recent invasion of the DARPA Internet by a quickly reproducing ``worm''
359 highlighted the need for a thorough review of the access
360 safeguards built into the system.
361 Until now, we have taken a passive approach to dealing with
362 weaknesses in the system access mechanisms, rather than actively
363 searching for possible weaknesses.
364 When we are notified of a problem or loophole in a system utility
366 we have a well defined procedure for fixing the problem and
367 expeditiously disseminating the fix to the BSD mailing list.
368 This procedure has proven itself to be effective in
369 solving known problems as they arise
370 (witness its success in handling the recent worm).
371 However, we feel that it would be useful to take a more active
372 role in identifying problems before they are reported (or exploited).
373 We will make a complete audit of the system
374 utilities and network servers to find unintended system access mechanisms.
376 As a part of the work to make the system more resistant to attack
377 from local users or via the network, it will be necessary to produce
378 additional documentation on the configuration and operation of the system.
379 This documentation will cover such topics as file and directory ownership
380 and access, network and server configuration,
381 and control of privileged operations such as file system backups.
383 We are investigating the addition of access control lists (ACLs) for
385 ACLs provide a much finer granularity of control over file access permissions
387 discretionary access control mechanism (mode bits).
388 Furthermore, they are necessary
389 in environments where C2 level security or better, as defined in the DoD
390 TCSEC [DoD83], is required.
391 The POSIX P1003.6 security group has made notable progress in determining
392 how an ACL mechanism should work, and several vendors have implemented
393 ACLs for their commercial systems.
394 Berkeley will investigate the existing implementations and determine
395 how to best integrate ACLs with the existing mechanism.
397 A major shortcoming of the present system is that authentication
398 over the network is based solely on the privileged port mechanism
399 between trusting hosts and users.
400 Although privileged ports can only be created by processes running as root
402 such processes are easy for a workstation user to obtain;
403 they simply reboot their workstation in single user mode.
404 Thus, a better authentication mechanism is needed.
405 At present, we believe that the MIT Kerberos authentication
406 server [Steiner88] provides the best solution to this problem.
407 We propose to investigate Kerberos further as well as other
408 authentication mechanisms and then to integrate
409 the best one into Berkeley UNIX.
410 Part of this integration would be the addition of the
411 authentication mechanism into utilities such as
412 telnet, login, remote shell, etc.
413 We will add support for telnet (eventually replacing rlogin),
414 the X window system, and the mail system within an authentication
415 domain (a Kerberos \fIrealm\fP).
416 We hope to replace the existing password authentication on each host
417 with the network authentication system.
422 Brownbridge, D.R., L.F. Marshall, B. Randell,
423 ``The Newcastle Connection, or UNIXes of the World Unite!,''
424 \fISoftware\- Practice and Experience\fP, Vol. 12, pp. 1147-1162, 1982.
428 Cole, C.T., P.B. Flinn, A.B. Atlas,
429 ``An Implementation of an Extended File System for UNIX,''
430 \fIUsenix Conference Proceedings\fP,
431 pp. 131-150, June, 1985.
435 Department of Defense,
436 ``Trusted Computer System Evaluation Criteria,''
437 \fICSC-STD-001-83\fP,
438 DoD Computer Security Center, August, 1983.
441 Karels, M., M. McKusick,
442 ``Towards a Compatible File System Interface,''
443 \fIProceedings of the European UNIX Users Group Meeting\fP,
444 Manchester, England, pp. 481-496, September 1986.
448 ``Vnodes: An Architecture for Multiple File System Types in Sun UNIX,''
449 \fIUsenix Conference Proceedings\fP,
450 pp. 238-247, June, 1986.
453 Leffler, S., M.K. McKusick, M. Karels,
454 ``Measuring and Improving the Performance of 4.2BSD,''
455 \fIUsenix Conference Proceedings\fP, pp. 237-252, June, 1984.
458 McKusick, M.K., W. Joy, S. Leffler, R. Fabry,
459 ``A Fast File System for UNIX'',
460 \fIACM Transactions on Computer Systems 2\fP, 3.
461 pp 181-197, August 1984.
464 McKusick, M.K., M. Karels, S. Leffler,
465 ``Performance Improvements and Functional Enhancements in 4.3BSD,''
466 \fIUsenix Conference Proceedings\fP, pp. 519-531, June, 1985.
469 McKusick, M.K., M. Karels,
470 ``A New Virtual Memory Implementation for Berkeley UNIX,''
471 \fIProceedings of the European UNIX Users Group Meeting\fP,
472 Manchester, England, pp. 451-460, September 1986.
475 McKusick, M.K., M. Karels,
476 ``Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel,''
477 \fIUsenix Conference Proceedings\fP,
478 pp. 295-303, June, 1988.
481 Rifkin, A.P., M.P. Forbes, R.L. Hamilton, M. Sabrio, S. Shah, K. Yueh,
482 ``RFS Architectural Overview,'' \fIUsenix Conference Proceedings\fP,
483 pp. 248-259, June, 1986.
486 Rodriguez, R., M. Koehler, R. Hyde,
487 ``The Generic File System,''
488 \fIUsenix Conference Proceedings\fP,
489 pp. 260-269, June, 1986.
492 Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, B. Lyon,
493 ``Design and Implementation of the Sun Network File System,''
494 \fIUsenix Conference Proceedings\fP,
495 pp. 119-130, June, 1985.
498 Satyanarayanan, M., \fIet al.\fP,
499 ``The ITC Distributed File System: Principles and Design,''
500 \fIProc. 10th Symposium on Operating Systems Principles\fP, pp. 35-50,
504 Steiner, J., C. Newman, J. Schiller,
505 ``\fIKerberos:\fP An Authentication Service for Open Network Systems,''
506 \fIUsenix Conference Proceedings\fP, pp. 191-202, February, 1988.
509 Walker, B.J. and S.H. Kiser, ``The LOCUS Distributed File System,''
510 \fIThe LOCUS Distributed System Architecture\fP,
511 G.J. Popek and B.J. Walker, ed., The MIT Press, Cambridge, MA, 1985.
514 Weinberger, P.J., ``The Version 8 Network File System,''
515 \fIUsenix Conference presentation\fP,