share/doc/smm/06.nfs/1.t

   1 .\"     $NetBSD: 1.t,v 1.2 1998/01/09 06:55:36 perry Exp $
   2 .\"
   3 .\" Copyright (c) 1993
   4 .\"     The Regents of the University of California.  All rights reserved.
   5 .\"
   6 .\" This document is derived from software contributed to Berkeley by
   7 .\" Rick Macklem at The University of Guelph.
   8 .\"
   9 .\" Redistribution and use in source and binary forms, with or without
  10 .\" modification, are permitted provided that the following conditions
  11 .\" are met:
  12 .\" 1. Redistributions of source code must retain the above copyright
  13 .\"    notice, this list of conditions and the following disclaimer.
  14 .\" 2. Redistributions in binary form must reproduce the above copyright
  15 .\"    notice, this list of conditions and the following disclaimer in the
  16 .\"    documentation and/or other materials provided with the distribution.
  17 .\" 3. Neither the name of the University nor the names of its contributors
  18 .\"    may be used to endorse or promote products derived from this software
  19 .\"    without specific prior written permission.
  20 .\"
  21 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  22 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  23 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  24 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  25 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  26 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  27 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  28 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  29 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  30 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  31 .\" SUCH DAMAGE.
  32 .\"
  33 .\"     @(#)1.t 8.1 (Berkeley) 6/8/93
  34 .\"
  35 .sh 1 "NFS Implementation"
  36 .pp
  37 The 4.4BSD implementation of NFS and the alternate protocol nicknamed
  38 Not Quite NFS (NQNFS) are kernel resident, but make use of a few system
  39 daemons.
  40 The kernel implementation does not use an RPC library, handling the RPC
  41 request and reply messages directly in \fImbuf\fR data areas. NFS
  42 interfaces to the network using
  43 sockets via. the kernel interface available in
  44 \fIsys/kern/uipc_syscalls.c\fR as \fIsosend(), soreceive(),\fR...
  45 There are connection management routines for support of sockets for connection
  46 oriented protocols and timeout/retransmit support for datagram sockets on
  47 the client side.
  48 For connection oriented transport protocols,
  49 such as TCP/IP, there is one connection
  50 for each client to server mount point that is maintained until an umount.
  51 If the connection breaks, the client will attempt a reconnect with a new
  52 socket.
  53 The client side can operate without any daemons running, but performance
  54 will be improved by running nfsiod daemons that perform read-aheads
  55 and write-behinds.
  56 For the server side to function, the daemons portmap, mountd and
  57 nfsd must be running.
  58 The mountd daemon performs two important functions.
  59 .ip 1)
  60 Upon startup and after a hangup signal, mountd reads the exports
  61 file and pushes the export information for each local file system down
  62 into the kernel via. the mount system call.
  63 .ip 2)
  64 Mountd handles remote mount protocol (RFC1094, Appendix A) requests.
  65 .lp
  66 The nfsd master daemon forks off children that enter the kernel
  67 via. the nfssvc system call. The children normally remain kernel
  68 resident, providing a process context for the NFS RPC servers. The only
  69 exception to this is when a Kerberos [Steiner88]
  70 ticket is received and at that time
  71 the nfsd exits the kernel temporarily to verify the ticket via. the
  72 Kerberos libraries and then returns to the kernel with the results.
  73 (This only happens for Kerberos mount points as described further under
  74 Security.)
  75 Meanwhile, the master nfsd waits to accept new connections from clients
  76 using connection oriented transport protocols and passes the new sockets down
  77 into the kernel.
  78 The client side mount_nfs along with portmap and
  79 mountd are the only parts of the NFS subsystem that make any
  80 use of the Sun RPC library.
  81 .sh 1 "Mount Problems"
  82 .pp
  83 There are several problems that can be encountered at the time of an NFS
  84 mount, ranging from a unresponsive NFS server (crashed, network partitioned
  85 from client, etc.) to various interoperability problems between different
  86 NFS implementations.
  87 .pp
  88 On the server side,
  89 if the 4.4BSD NFS server will be handling any PC clients, mountd will
  90 require the \fB-n\fR option to enable non-root mount request servicing.
  91 Running of a pcnfsd\** daemon will also be necessary.
  92 .(f
  93 \** Pcnfsd is available in source form from Sun Microsystems and many
  94 anonymous ftp sites.
  95 .)f
  96 The server side requires that the daemons
  97 mountd and nfsd be running and that
  98 they be registered with portmap properly.
  99 If problems are encountered,
 100 the safest fix is to kill all the daemons and then restart them in
 101 the order portmap, mountd and nfsd.
 102 Other server side problems are normally caused by problems with the format
 103 of the exports file, which is covered under
 104 Security and in the exports man page.
 105 .pp
 106 On the client side, there are several mount options useful for dealing
 107 with server problems.
 108 In cases where a file system is not critical for system operation, the
 109 \fB-b\fR
 110 mount option may be specified so that mount_nfs will go into the
 111 background for a mount attempt on an unresponsive server.
 112 This is useful for mounts specified in
 113 \fIfstab(5)\fR,
 114 so that the system will not get hung while booting doing
 115 \fBmount -a\fR
 116 because a file server is not responsive.
 117 On the other hand, if the file system is critical to system operation, this
 118 option should not be used so that the client will wait for the server to
 119 come up before completing bootstrapping.
 120 There are also three mount options to help deal with interoperability issues
 121 with various non-BSD NFS servers. The
 122 \fB-P\fR
 123 option specifies that the NFS
 124 client use a reserved IP port number to satisfy some servers' security
 125 requirements.\**
 126 .(f
 127 \**Any security benefit of this is highly questionable and as
 128 such the BSD server does not require a client to use a reserved port number.
 129 .)f
 130 The
 131 \fB-c\fR
 132 option stops the NFS client from doing a \fIconnect\fR on the UDP
 133 socket, so that the mount works with servers that send NFS replies from
 134 port numbers other than the standard 2049.\**
 135 .(f
 136 \**The Encore Multimax is known
 137 to require this.
 138 .)f
 139 Finally, the
 140 \fB-g=\fInum\fR
 141 option sets the maximum size of the group list in the credentials passed
 142 to an NFS server in every RPC request. Although RFC1057 specifies a maximum
 143 size of 16 for the group list, some servers can't handle that many.
 144 If a user, particularly root doing a mount,
 145 keeps getting access denied from a file server, try temporarily
 146 reducing the number of groups that user is in to less than 5
 147 by editing /etc/group. If the user can then access the file system, slowly
 148 increase the number of groups for that user until the limit is found and
 149 then peg the limit there with the
 150 \fB-g=\fInum\fR
 151 option.
 152 This implies that the server will only see the first \fInum\fR
 153 groups that the user is in, which can cause some accessibility problems.
 154 .pp
 155 For sites that have many NFS servers, amd [Pendry93]
 156 is a useful administration tool.
 157 It also reduces the number of actual NFS mount points, alleviating problems
 158 with commands such as df(1) that hang when any of the NFS servers is
 159 unreachable.
 160 .sh 1 "Dealing with Hung Servers"
 161 .pp
 162 There are several mount options available to help a client deal with
 163 being hung waiting for response from a crashed or unreachable\** server.
 164 .(f
 165 \**Due to a network partitioning or similar.
 166 .)f
 167 By default, a hard mount will continue to try to contact the server
 168 ``forever'' to complete the system call. This type of mount is appropriate
 169 when processes on the client that access files in the file system do not
 170 tolerate file I/O systems calls that return -1 with \fIerrno == EINTR\fR
 171 and/or access to the file system is critical for normal system operation.
 172 .lp
 173 There are two other alternatives:
 174 .ip 1)
 175 A soft mount (\fB-s\fR option) retries an RPC \fIn\fR
 176 times and then the corresponding
 177 system call returns -1 with errno set to EINTR.
 178 For TCP transport, the actual RPC request is not retransmitted, but the
 179 timeout intervals waiting for a reply from the server are done
 180 in the same manner as UDP for this purpose.
 181 The problem with this type of mount is that most applications do not
 182 expect an EINTR error return from file I/O system calls (since it never
 183 occurs for a local file system) and get confused by the error return
 184 from the I/O system call.
 185 The option
 186 \fB-x=\fInum\fR
 187 is used to set the RPC retry limit and if set too low, the error returns
 188 will start occurring whenever the NFS server is slow due to heavy load.
 189 Alternately, a large retry limit can result in a process hung for a long
 190 time, due to a crashed server or network partitioning.
 191 .ip 2)
 192 An interruptible mount (\fB-i\fR option) checks to see if a termination signal
 193 is pending for the process when waiting for server response and if it is,
 194 the I/O system call posts an EINTR. Normally this results in the process
 195 being terminated by the signal when returning from the system call.
 196 This feature allows you to ``^C'' out of processes that are hung
 197 due to unresponsive servers.
 198 The problem with this approach is that signals that are caught by
 199 a process are not recognized as termination signals
 200 and the process will remain hung.\**
 201 .(f
 202 \**Unfortunately, there are also some resource allocation situations in the
 203 BSD kernel where the termination signal will be ignored and the process
 204 will not terminate.
 205 .)f
 206 .sh 1 "RPC Transport Issues"
 207 .pp
 208 The NFS Version 2 protocol runs over UDP/IP transport by
 209 sending each Sun Remote Procedure Call (RFC1057)
 210 request/reply message in a single UDP
 211 datagram. Since UDP does not guarantee datagram delivery, the
 212 Remote Procedure Call (RPC) layer
 213 times out and retransmits an RPC request if
 214 no RPC reply has been received. Since this round trip timeout (RTO) value
 215 is for the entire RPC operation, including RPC message transmission to the
 216 server, queuing at the server for an nfsd, performing the RPC and
 217 sending the RPC reply message back to the client, it can be highly variable
 218 for even a moderately loaded NFS server.
 219 As a result, the RTO interval must be a conservation (large) estimate, in
 220 order to avoid extraneous RPC request retransmits.\**
 221 .(f
 222 \**At best, an extraneous RPC request retransmit increases
 223 the load on the server and at worst can result in damaged files
 224 on the server when non-idempotent RPCs are redone [Juszczak89].
 225 .)f
 226 Also, with an 8Kbyte read/write data size
 227 (the default), the read/write reply/request will be an 8+Kbyte UDP datagram
 228 that must normally be fragmented at the IP layer for transmission.\**
 229 .(f
 230 \**6 IP fragments for an Ethernet,
 231 which has an maximum transmission unit of 1500bytes.
 232 .)f
 233 For IP fragments to be successfully reassembled into
 234 the IP datagram at the receive end, all
 235 fragments must be received within a fairly short ``time to live''.
 236 If one fragment is lost/damaged in transit,
 237 the entire RPC must be retransmitted and redone.
 238 This problem can be exaggerated by a network interface on the receiver that
 239 cannot handle the reception of back to back network packets. [Kent87a]
 240 .pp
 241 There are several tuning mount
 242 options on the client side that can prove useful when trying to
 243 alleviate performance problems related to UDP RPC transport.
 244 The options
 245 \fB-r=\fInum\fR
 246 and
 247 \fB-w=\fInum\fR
 248 specify the maximum read or write data size respectively.
 249 The size \fInum\fR
 250 should be a power of 2 (4K, 2K, 1K) and adjusted downward from the
 251 maximum of 8Kbytes
 252 whenever IP fragmentation is causing problems. The best indicator of
 253 IP fragmentation problems is a significant number of
 254 \fIfragments dropped after timeout\fR
 255 reported by the \fIip:\fR section of a \fBnetstat -s\fR
 256 command on either the client or server.
 257 Of course, if the fragments are being dropped at the server, it can be
 258 fun figuring out which client(s) are involved.
 259 The most likely candidates are clients that are not
 260 on the same local area network as the
 261 server or have network interfaces that do not receive several
 262 back to back network packets properly.
 263 .pp
 264 By default, the 4.4BSD NFS client dynamically estimates the retransmit
 265 timeout interval for the RPC and this appears to work reasonably well for
 266 many environments. However, the
 267 \fB-d\fR
 268 flag can be specified to turn off
 269 the dynamic estimation of retransmit timeout, so that the client will
 270 use a static initial timeout interval.\**
 271 .(f
 272 \**After the first retransmit timeout, the initial interval is backed off
 273 exponentially.
 274 .)f
 275 The
 276 \fB-t=\fInum\fR
 277 option can be used with
 278 \fB-d\fR
 279 to set the initial timeout interval to other than the default of 2 seconds.
 280 The best indicator that dynamic estimation should be turned off would
 281 be a significant number\** in the \fIX Replies\fR field and a
 282 .(f
 283 \**Even 0.1% of the total RPCs is probably significant.
 284 .)f
 285 large number in the \fIRetries\fR field
 286 in the \fIRpc Info:\fR section as reported
 287 by the \fBnfsstat\fR command.
 288 On the server, there would be significant numbers of \fIInprog\fR recent
 289 request cache hits in the \fIServer Cache Stats:\fR section as reported
 290 by the \fBnfsstat\fR command, when run on the server.
 291 .pp
 292 The tradeoff is that a smaller timeout interval results in a better
 293 average RPC response time, but increases the risk of extraneous retries
 294 that in turn increase server load and the possibility of damaged files
 295 on the server. It is probably best to err on the safe side and use a large
 296 (>= 2sec) fixed timeout if the dynamic retransmit timeout estimation
 297 seems to be causing problems.
 298 .pp
 299 An alternative to all this fiddling is to run NFS over TCP transport instead
 300 of UDP.
 301 Since the 4.4BSD TCP implementation provides reliable
 302 delivery with congestion control, it avoids all of the above problems.
 303 It also permits the use of read and write data sizes greater than the 8Kbyte
 304 limit for UDP transport.\**
 305 .(f
 306 \**Read/write data sizes greater than 8Kbytes will not normally improve
 307 performance unless the kernel constant MAXBSIZE is increased and the
 308 file system on the server has a block size greater than 8Kbytes.
 309 .)f
 310 NFS over TCP usually delivers comparable to significantly better performance
 311 than NFS over UDP
 312 unless the client or server processor runs at less than 5-10MIPS. For a
 313 slow processor, the extra CPU overhead of using TCP transport will become
 314 significant and TCP transport may only be useful when the client
 315 to server interconnect traverses congested gateways.
 316 The main problem with using TCP transport is that it is only supported
 317 between BSD clients and servers.\**
 318 .(f
 319 \**There are rumors of commercial NFS over TCP implementations on the horizon
 320 and these may well be worth exploring.
 321 .)f
 322 .sh 1 "Other Tuning Tricks"
 323 .pp
 324 Another mount option that may improve performance over
 325 certain network interconnects is \fB-a=\fInum\fR
 326 which sets the number of blocks that the system will
 327 attempt to read-ahead during sequential reading of a file. The default value
 328 of 1 seems to be appropriate for most situations, but a larger value might
 329 achieve better performance for some environments, such as a mount to a server
 330 across a ``high bandwidth * round trip delay'' interconnect.
 331 .pp
 332 For the adventurous, playing with the size of the buffer cache
 333 can also improve performance for some environments that use NFS heavily.
 334 Under some workloads, a buffer cache of 4-6Mbytes can result in significant
 335 performance improvements over 1-2Mbytes, both in client side system call
 336 response time and reduced server RPC load.
 337 The buffer cache size defaults to 10% of physical memory,
 338 but this can be overridden by specifying the BUFPAGES option
 339 in the machine's config file.\**
 340 .(f
 341 BUFPAGES is the number of physical machine pages allocated to the buffer cache.
 342 ie. BUFPAGES * NBPG = buffer cache size in bytes
 343 .)f
 344 When increasing the size of BUFPAGES, it is also advisable to increase the
 345 number of buffers NBUF by a corresponding amount.
 346 Note that there is a tradeoff of memory allocated to the buffer cache versus
 347 available for paging, which implies that making the buffer cache larger
 348 will increase paging rate, with possibly disastrous results.
 349 .sh 1 "Security Issues"
 350 .pp
 351 When a machine is running an NFS server it opens up a great big security hole.
 352 For ordinary NFS, the server receives client credentials
 353 in the RPC request as a user id
 354 and a list of group ids and trusts them to be authentic!
 355 The only tool available to restrict remote access to
 356 file systems with is the exports(5) file,
 357 so file systems should be exported with great care.
 358 The exports file is read by mountd upon startup and after a hangup signal
 359 is posted for it and then as much of the access specifications as possible are
 360 pushed down into the kernel for use by the nfsd(s).
 361 The trick here is that the kernel information is stored on a per
 362 local file system mount point and client host address basis and cannot refer to
 363 individual directories within the local server file system.
 364 It is best to think of the exports file as referring to the various local
 365 file systems and not just directory paths as mount points.
 366 A local file system may be exported to a specific host, all hosts that
 367 match a subnet mask or all other hosts (the world). The latter is very
 368 dangerous and should only be used for public information. It is also
 369 strongly recommended that file systems exported to ``the world'' be exported
 370 read-only.
 371 For each host or group of hosts, the file system can be exported read-only or
 372 read/write.
 373 You can also define one of three client user id to server credential
 374 mappings to help control access.
 375 Root (user id == 0) can be mapped to some default credentials while all other
 376 user ids are accepted as given.
 377 If the default credentials for user id equal zero
 378 are root, then there is essentially no remapping.
 379 Most NFS file systems are exported this way, most commonly mapping
 380 user id == 0 to the credentials for the user nobody.
 381 Since the client user id and group id list is used unchanged on the server
 382 (except for root), this also implies that
 383 the user id and group id space must be common between the client and server.
 384 (ie. user id N on the client must refer to the same user on the server)
 385 All user ids can be mapped to a default set of credentials, typically that of
 386 the user nobody. This essentially gives world access to all
 387 users on the corresponding hosts.
 388 .pp
 389 There is also a non-standard BSD
 390 \fB-kerb\fR export option that requires the client provide
 391 a KerberosIV rcmd service ticket to authenticate the user on the server.
 392 If successful, the Kerberos principal is looked up in the server's password
 393 and group databases to get a set of credentials and a map of client userid to
 394 these credentials is then cached.
 395 The use of TCP transport is strongly recommended,
 396 since the scheme depends on the TCP connection to avert replay attempts.
 397 Unfortunately, this option is only usable
 398 between BSD clients and servers since it is
 399 not compatible with other known ``kerberized'' NFS systems.
 400 To enable use of this Kerberos option, both mount_nfs on the client and
 401 nfsd on the server must be rebuilt with the -DKERBEROS option and
 402 linked to KerberosIV libraries.
 403 The file system is then exported to the client(s) with the \fB-kerb\fR option
 404 in the exports file on the server
 405 and the client mount specifies the
 406 \fB-K\fR
 407 and
 408 \fB-T\fR
 409 options.
 410 The
 411 \fB-m=\fIrealm\fR
 412 mount option may be used to specify a Kerberos Realm for the ticket
 413 (it must be the Kerberos Realm of the server) that is other than
 414 the client's local Realm.
 415 To access files in a \fB-kerb\fR mount point, the user must have a valid
 416 TGT for the server's Realm, as provided by kinit or similar.
 417 .pp
 418 As well as the standard NFS Version 2 protocol (RFC1094) implementation, BSD
 419 systems can use a variant of the protocol called Not Quite NFS (NQNFS) that
 420 supports a variety of protocol extensions.
 421 This protocol uses 64bit file offsets
 422 and sizes, an \fIaccess rpc\fR, an \fIappend\fR option on the write rpc
 423 and extended file attributes to support 4.4BSD file system functionality
 424 more fully.
 425 It also makes use of a variant of short term
 426 \fIleases\fR [Gray89] with delayed write client caching,
 427 in an effort to provide full cache consistency and better performance.
 428 This protocol is available between 4.4BSD systems only and is used when
 429 the \fB-q\fR mount option is specified.
 430 It can be used with any of the aforementioned options for NFS, such as TCP
 431 transport (\fB-T\fR) and KerberosIV authentication (\fB-K\fR).
 432 Although this protocol is experimental, it is recommended over NFS for
 433 mounts between 4.4BSD systems.\**
 434 .(f
 435 \**I would appreciate email from anyone who can provide
 436 NFS vs. NQNFS performance measurements,
 437 particularly fast clients, many clients or over an internetwork
 438 connection with a large ``bandwidth * RTT'' product.
 439 .)f
 440 .sh 1 "Monitoring NFS Activity"
 441 .pp
 442 The basic command for monitoring NFS activity on clients and servers is
 443 nfsstat. It reports cumulative statistics of various NFS activities,
 444 such as counts of the various different RPCs and cache hit rates on the client
 445 and server. Of particular interest on the server are the fields in the
 446 \fIServer Cache Stats:\fR section, which gives numbers for RPC retries received
 447 in the first three fields and total RPCs in the fourth. The first three fields
 448 should remain a very small percentage of the total. If not, it
 449 would indicate one or more clients doing retries too aggressively and the fix
 450 would be to isolate these clients,
 451 disable the dynamic RTO estimation on them and
 452 make their initial timeout interval a conservative (ie. large) value.
 453 .pp
 454 On the client side, the fields in the \fIRpc Info:\fR section are of particular
 455 interest, as they give an overall picture of NFS activity.
 456 The \fITimedOut\fR field is the number of I/O system calls that returned -1
 457 for ``soft'' mounts and can be reduced
 458 by increasing the retry limit or changing
 459 the mount type to ``intr'' or ``hard''.
 460 The \fIInvalid\fR field is a count of trashed RPC replies that are received
 461 and should remain zero.\**
 462 .(f
 463 \**Some NFS implementations run with UDP checksums disabled, so garbage RPC
 464 messages can be received.
 465 .)f
 466 The \fIX Replies\fR field counts the number of repeated RPC replies received
 467 from the server and is a clear indication of a too aggressive RTO estimate.
 468 Unfortunately, a good NFS server implementation will use a ``recent request
 469 cache'' [Juszczak89] that will suppress the extraneous replies.
 470 A large value for \fIRetries\fR indicates a problem, but
 471 it could be any of:
 472 .ip \(bu
 473 a too aggressive RTO estimate
 474 .ip \(bu
 475 an overloaded NFS server
 476 .ip \(bu
 477 IP fragments being dropped (gateway, client or server)
 478 .lp
 479 and requires further investigation.
 480 The \fIRequests\fR field is the total count of RPCs done on all servers.
 481 .pp
 482 The \fBnetstat -s\fR comes in useful during investigation of RPC transport
 483 problems.
 484 The field \fIfragments dropped after timeout\fR in
 485 the \fIip:\fR section indicates IP fragments are
 486 being lost and a significant number of these occurring indicates that the
 487 use of TCP transport or a smaller read/write data size is in order.
 488 A significant number of \fIbad checksums\fR reported in the \fIudp:\fR
 489 section would suggest network problems of a more generic sort.
 490 (cabling, transceiver or network hardware interface problems or similar)
 491 .pp
 492 There is a RPC activity logging facility for both the client and
 493 server side in the kernel.
 494 When logging is enabled by setting the kernel variable nfsrtton to
 495 one, the logs in the kernel structures nfsrtt (for the client side)
 496 and nfsdrt (for the server side) are updated upon the completion
 497 of each RPC in a circular manner.
 498 The pos element of the structure is the index of the next element
 499 of the log array to be updated.
 500 In other words, elements of the log array from \fIlog\fR[pos] to
 501 \fIlog\fR[pos - 1] are in chronological order.
 502 The include file <sys/nfsrtt.h> should be consulted for details on the
 503 fields in the two log structures.\**
 504 .(f
 505 \**Unfortunately, a monitoring tool that uses these logs is still in the
 506 planning (dreaming) stage.
 507 .)f
 508 .sh 1 "Diskless Client Support"
 509 .pp
 510 The NFS client does include kernel support for diskless/dataless operation
 511 where the root file system and optionally the swap area is remote NFS mounted.
 512 A diskless/dataless client is configured using a version of the
 513 ``swapvmunix.c'' file as provided in the directory \fIcontrib/diskless.nfs\fR.
 514 If the swap device == NODEV, it specifies an NFS mounted swap area and should
 515 be configured the same size as set up by diskless_setup when run on the server.
 516 This file must be put in the \fIsys/compile/<machine_name>\fR kernel build
 517 directory after the config command has been run, since config does
 518 not know about specifying NFS root and swap areas.
 519 The kernel variable mountroot must be set to nfs_mountroot instead of
 520 ffs_mountroot and the kernel structure nfs_diskless must be filled in
 521 properly.
 522 There are some primitive system administration tools in the \fIcontrib/diskless.nfs\fR directory to assist in filling in
 523 the nfs_diskless structure and in setting up an NFS server for
 524 diskless/dataless clients.
 525 The tools were designed to provide a bare bones capability, to allow maximum
 526 flexibility when setting up different servers.
 527 .lp
 528 The tools are as follows:
 529 .ip \(bu
 530 diskless_offset.c - This little program reads a ``vmunix'' object file and
 531 writes the file byte offset of the nfs_diskless structure in it to
 532 standard out. It was kept separate because it sometimes has to
 533 be compiled/linked in funny ways depending on the client architecture.
 534 (See the comment at the beginning of it.)
 535 .ip \(bu
 536 diskless_setup.c - This program is run on the server and sets up files for a
 537 given client. It mostly just fills in an nfs_diskless structure and
 538 writes it out to either the "vmunix" file or a separate file called
 539 /var/diskless/setup.<official-hostname>
 540 .ip \(bu
 541 diskless_boot.c - There are two functions in here that may be used
 542 by a bootstrap server such as tftpd to permit sharing of the ``vmunix''
 543 object file for similar clients. This saves disk space on the bootstrap
 544 server and simplify organization, but are not critical for correct operation.
 545 They read the ``vmunix''
 546 file, but optionally fill in the nfs_diskless structure from a
 547 separate "setup.<official-hostname>" file so that there is only
 548 one copy of "vmunix" for all similar (same arch etc.) clients.
 549 These functions use a text file called
 550 /var/diskless/boot.<official-hostname> to control the netboot.
 551 .lp
 552 The basic setup steps are:
 553 .ip \(bu
 554 make a "vmunix" for the client(s) with mountroot() == nfs_mountroot()
 555 and swdevt[0].sw_dev == NODEV if it is to do nfs swapping as well
 556 (See the same swapvmunix.c file)
 557 .ip \(bu
 558 run diskless_offset on the vmunix file to find out the byte offset
 559 of the nfs_diskless structure
 560 .ip \(bu
 561 Run diskless_setup on the server to set up the server and fill in the
 562 nfs_diskless structure for that client.
 563 The nfs_diskless structure can either be written into the
 564 vmunix file (the -x option) or
 565 saved in /var/diskless/setup.<official-hostname>.
 566 .ip \(bu
 567 Set up the bootstrap server. If the nfs_diskless structure was written into
 568 the ``vmunix'' file, any vanilla bootstrap protocol such as bootp/tftp can
 569 be used. If the bootstrap server has been modified to use the functions in
 570 diskless_boot.c, then a
 571 file called /var/diskless/boot.<official-hostname>
 572 must be created.
 573 It is simply a two line text file, where the first line is the pathname
 574 of the correct ``vmunix'' file and the second line has the pathname of
 575 the nfs_diskless structure file and its byte offset in it.
 576 For example:
 577 .br
 578         /var/diskless/vmunix.pmax
 579 .br
 580         /var/diskless/setup.rickers.cis.uoguelph.ca 642308
 581 .br
 582 .ip \(bu
 583 Create a /var subtree for each client in an appropriate place on the server,
 584 such as /var/diskless/var/<client-hostname>/...
 585 By using the <client-hostname> to differentiate /var for each host,
 586 /etc/rc can be modified to mount the correct /var from the server.