share/doc/papers/kernmalloc/kernmalloc.t

   1 .\"     $NetBSD: kernmalloc.t,v 1.4 2003/02/04 23:47:52 perry Exp $
   2 .\"
   3 .\" Copyright (c) 1988 The Regents of the University of California.
   4 .\" All rights reserved.
   5 .\"
   6 .\" Redistribution and use in source and binary forms, with or without
   7 .\" modification, are permitted provided that the following conditions
   8 .\" are met:
   9 .\" 1. Redistributions of source code must retain the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer.
  11 .\" 2. Redistributions in binary form must reproduce the above copyright
  12 .\"    notice, this list of conditions and the following disclaimer in the
  13 .\"    documentation and/or other materials provided with the distribution.
  14 .\" 3. Neither the name of the University nor the names of its contributors
  15 .\"    may be used to endorse or promote products derived from this software
  16 .\"    without specific prior written permission.
  17 .\"
  18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  19 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  21 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  22 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  23 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  24 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  26 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  27 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  28 .\" SUCH DAMAGE.
  29 .\"
  30 .\"     @(#)kernmalloc.t        5.1 (Berkeley) 4/16/91
  31 .\"
  32 .\" reference a system routine name
  33 .de RN
  34 \fI\\$1\fP\^(\h'1m/24u')\\$2
  35 ..
  36 .\" reference a header name
  37 .de H
  38 .NH \\$1
  39 \\$2
  40 ..
  41 .\" begin figure
  42 .\" .FI "title"
  43 .nr Fn 0 1
  44 .de FI
  45 .ds Lb Figure \\n+(Fn
  46 .ds Lt \\$1
  47 .KF
  48 .DS B
  49 .nf
  50 ..
  51 .\"
  52 .\" end figure
  53 .de Fe
  54 .sp .5
  55 .\" cheat: original indent is stored in \n(OI by .DS B; restore it
  56 .\" then center legend after .DE rereads and centers the block.
  57 \\\\.in \\n(OI
  58 \\\\.ce
  59 \\\\*(Lb.  \\\\*(Lt
  60 .sp .5
  61 .DE
  62 .KE
  63 .if \nd 'ls 2
  64 ..
  65 .EQ
  66 delim $$
  67 .EN
  68 .ds CH "
  69 .pn 295
  70 .sp
  71 .rs
  72 .ps -1
  73 .sp -1
  74 .fi
  75 Reprinted from:
  76 \fIProceedings of the San Francisco USENIX Conference\fP,
  77 pp. 295-303, June 1988.
  78 .ps
  79 .\".sp |\n(HMu
  80 .rm CM
  81 .nr PO 1.25i
  82 .TL
  83 Design of a General Purpose Memory Allocator for the 4.3BSD UNIX\(dg Kernel
  84 .ds LF Summer USENIX '88
  85 .ds CF "%
  86 .ds RF San Francisco, June 20-24
  87 .EH 'Design of a General Purpose Memory ...''McKusick, Karels'
  88 .OH 'McKusick, Karels''Design of a General Purpose Memory ...'
  89 .FS
  90 \(dgUNIX is a registered trademark of AT&T in the US and other countries.
  91 .FE
  92 .AU
  93 Marshall Kirk McKusick
  94 .AU
  95 Michael J. Karels
  96 .AI
  97 Computer Systems Research Group
  98 Computer Science Division
  99 Department of Electrical Engineering and Computer Science
 100 University of California, Berkeley
 101 Berkeley, California  94720
 102 .AB
 103 The 4.3BSD UNIX kernel uses many memory allocation mechanisms,
 104 each designed for the particular needs of the particular subsystem.
 105 This paper describes a general purpose dynamic memory allocator
 106 that can be used by all of the kernel subsystems.
 107 The design of this allocator takes advantage of known memory usage
 108 patterns in the UNIX kernel and a hybrid strategy that is time-efficient
 109 for small allocations and space-efficient for large allocations.
 110 This allocator replaces the multiple memory allocation interfaces
 111 with a single easy-to-program interface,
 112 results in more efficient use of global memory by eliminating
 113 partitioned and specialized memory pools,
 114 and is quick enough that no performance loss is observed
 115 relative to the current implementations.
 116 The paper concludes with a discussion of our experience in using
 117 the new memory allocator,
 118 and directions for future work.
 119 .AE
 120 .LP
 121 .H 1 "Kernel Memory Allocation in 4.3BSD
 122 .PP
 123 The 4.3BSD kernel has at least ten different memory allocators.
 124 Some of them handle large blocks,
 125 some of them handle small chained data structures,
 126 and others include information to describe I/O operations.
 127 Often the allocations are for small pieces of memory that are only
 128 needed for the duration of a single system call.
 129 In a user process such short-term
 130 memory would be allocated on the run-time stack.
 131 Because the kernel has a limited run-time stack,
 132 it is not feasible to allocate even moderate blocks of memory on it.
 133 Consequently, such memory must be allocated through a more dynamic mechanism.
 134 For example,
 135 when the system must translate a pathname,
 136 it must allocate a one kilobye buffer to hold the name.
 137 Other blocks of memory must be more persistent than a single system call
 138 and really have to be allocated from dynamic memory.
 139 Examples include protocol control blocks that remain throughout
 140 the duration of the network connection.
 141 .PP
 142 Demands for dynamic memory allocation in the kernel have increased
 143 as more services have been added.
 144 Each time a new type of memory allocation has been required,
 145 a specialized memory allocation scheme has been written to handle it.
 146 Often the new memory allocation scheme has been built on top
 147 of an older allocator.
 148 For example, the block device subsystem provides a crude form of
 149 memory allocation through the allocation of empty buffers [Thompson78].
 150 The allocation is slow because of the implied semantics of
 151 finding the oldest buffer, pushing its contents to disk if they are dirty,
 152 and moving physical memory into or out of the buffer to create
 153 the requested size.
 154 To reduce the overhead, a ``new'' memory allocator was built in 4.3BSD
 155 for name translation that allocates a pool of empty buffers.
 156 It keeps them on a free list so they can
 157 be quickly allocated and freed [McKusick85].
 158 .PP
 159 This memory allocation method has several drawbacks.
 160 First, the new allocator can only handle a limited range of sizes.
 161 Second, it depletes the buffer pool, as it steals memory intended
 162 to buffer disk blocks to other purposes.
 163 Finally, it creates yet another interface of
 164 which the programmer must be aware.
 165 .PP
 166 A generalized memory allocator is needed to reduce the complexity
 167 of writing code inside the kernel.
 168 Rather than providing many semi-specialized ways of allocating memory,
 169 the kernel should provide a single general purpose allocator.
 170 With only a single interface,
 171 programmers do not need to figure
 172 out the most appropriate way to allocate memory.
 173 If a good general purpose allocator is available,
 174 it helps avoid the syndrome of creating yet another special
 175 purpose allocator.
 176 .PP
 177 To ease the task of understanding how to use it,
 178 the memory allocator should have an interface similar to the interface
 179 of the well-known memory allocator provided for
 180 applications programmers through the C library routines
 181 .RN malloc
 182 and
 183 .RN free .
 184 Like the C library interface,
 185 the allocation routine should take a parameter specifying the
 186 size of memory that is needed.
 187 The range of sizes for memory requests should not be constrained.
 188 The free routine should take a pointer to the storage being freed,
 189 and should not require additional information such as the size
 190 of the piece of memory being freed.
 191 .H 1 "Criteria for a Kernel Memory Allocator
 192 .PP
 193 The design specification for a kernel memory allocator is similar to,
 194 but not identical to,
 195 the design criteria for a user level memory allocator.
 196 The first criterion for a memory allocator is that it make good use
 197 of the physical memory.
 198 Good use of memory is measured by the amount of memory needed to hold
 199 a set of allocations at any point in time.
 200 Percentage utilization is expressed as:
 201 .EQ
 202 utilization~=~requested over required
 203 .EN
 204 Here, ``requested'' is the sum of the memory that has been requested
 205 and not yet freed.
 206 ``Required'' is the amount of memory that has been
 207 allocated for the pool from which the requests are filled.
 208 An allocator requires more memory than requested because of fragmentation
 209 and a need to have a ready supply of free memory for future requests.
 210 A perfect memory allocator would have a utilization of 100%.
 211 In practice,
 212 having a 50% utilization is considered good [Korn85].
 213 .PP
 214 Good memory utilization in the kernel is more important than
 215 in user processes.
 216 Because user processes run in virtual memory,
 217 unused parts of their address space can be paged out.
 218 Thus pages in the process address space
 219 that are part of the ``required'' pool that are not
 220 being ``requested'' need not tie up physical memory.
 221 Because the kernel is not paged,
 222 all pages in the ``required'' pool are held by the kernel and
 223 cannot be used for other purposes.
 224 To keep the kernel utilization percentage as high as possible,
 225 it is desirable to release unused memory in the ``required'' pool
 226 rather than to hold it as is typically done with user processes.
 227 Because the kernel can directly manipulate its own page maps,
 228 releasing unused memory is fast;
 229 a user process must do a system call to release memory.
 230 .PP
 231 The most important criterion for a memory allocator is that it be fast.
 232 Because memory allocation is done frequently,
 233 a slow memory allocator will degrade the system performance.
 234 Speed of allocation is more critical when executing in the
 235 kernel than in user code,
 236 because the kernel must allocate many data structure that user
 237 processes can allocate cheaply on their run-time stack.
 238 In addition, the kernel represents the platform on which all user
 239 processes run,
 240 and if it is slow, it will degrade the performance of every process
 241 that is running.
 242 .PP
 243 Another problem with a slow memory allocator is that programmers
 244 of frequently-used kernel interfaces will feel that they
 245 cannot afford to use it as their primary memory allocator.
 246 Instead they will build their own memory allocator on top of the
 247 original by maintaining their own pool of memory blocks.
 248 Multiple allocators reduce the efficiency with which memory is used.
 249 The kernel ends up with many different free lists of memory
 250 instead of a single free list from which all allocation can be drawn.
 251 For example,
 252 consider the case of two subsystems that need memory.
 253 If they have their own free lists,
 254 the amount of memory tied up in the two lists will be the
 255 sum of the greatest amount of memory that each of
 256 the two subsystems has ever used.
 257 If they share a free list,
 258 the amount of memory tied up in the free list may be as low as the
 259 greatest amount of memory that either subsystem used.
 260 As the number of subsystems grows,
 261 the savings from having a single free list grow.
 262 .H 1 "Existing User-level Implementations
 263 .PP
 264 There are many different algorithms and
 265 implementations of user-level memory allocators.
 266 A survey of those available on UNIX systems appeared in [Korn85].
 267 Nearly all of the memory allocators tested made good use of memory,
 268 though most of them were too slow for use in the kernel.
 269 The fastest memory allocator in the survey by nearly a factor of two
 270 was the memory allocator provided on 4.2BSD originally
 271 written by Chris Kingsley at California Institute of Technology.
 272 Unfortunately,
 273 the 4.2BSD memory allocator also wasted twice as much memory
 274 as its nearest competitor in the survey.
 275 .PP
 276 The 4.2BSD user-level memory allocator works by maintaining a set of lists
 277 that are ordered by increasing powers of two.
 278 Each list contains a set of memory blocks of its corresponding size.
 279 To fulfill a memory request,
 280 the size of the request is rounded up to the next power of two.
 281 A piece of memory is then removed from the list corresponding
 282 to the specified power of two and returned to the requester.
 283 Thus, a request for a block of memory of size 53 returns
 284 a block from the 64-sized list.
 285 A typical memory allocation requires a roundup calculation
 286 followed by a linked list removal.
 287 Only if the list is empty is a real memory allocation done.
 288 The free operation is also fast;
 289 the block of memory is put back onto the list from which it came.
 290 The correct list is identified by a size indicator stored
 291 immediately preceding the memory block.
 292 .H 1 "Considerations Unique to a Kernel Allocator
 293 .PP
 294 There are several special conditions that arise when writing a
 295 memory allocator for the kernel that do not apply to a user process
 296 memory allocator.
 297 First, the maximum memory allocation can be determined at
 298 the time that the machine is booted.
 299 This number is never more than the amount of physical memory on the machine,
 300 and is typically much less since a machine with all its
 301 memory dedicated to the operating system is uninteresting to use.
 302 Thus, the kernel can statically allocate a set of data structures
 303 to manage its dynamically allocated memory.
 304 These data structures never need to be
 305 expanded to accommodate memory requests;
 306 yet, if properly designed, they need not be large.
 307 For a user process, the maximum amount of memory that may be allocated
 308 is a function of the maximum size of its virtual memory.
 309 Although it could allocate static data structures to manage
 310 its entire virtual memory,
 311 even if they were efficiently encoded they would potentially be huge.
 312 The other alternative is to allocate data structures as they are needed.
 313 However, that adds extra complications such as new
 314 failure modes if it cannot allocate space for additional
 315 structures and additional mechanisms to link them all together.
 316 .PP
 317 Another special condition of the kernel memory allocator is that it
 318 can control its own address space.
 319 Unlike user processes that can only grow and shrink their heap at one end,
 320 the kernel can keep an arena of kernel addresses and allocate
 321 pieces from that arena which it then populates with physical memory.
 322 The effect is much the same as a user process that has parts of
 323 its address space paged out when they are not in use,
 324 except that the kernel can explicitly control the set of pages
 325 allocated to its address space.
 326 The result is that the ``working set'' of pages in use by the
 327 kernel exactly corresponds to the set of pages that it is really using.
 328 .FI "One day memory usage on a Berkeley time-sharing machine"
 329 .so usage.tbl
 330 .Fe
 331 .PP
 332 A final special condition that applies to the kernel is that
 333 all of the different uses of dynamic memory are known in advance.
 334 Each one of these uses of dynamic memory can be assigned a type.
 335 For each type of dynamic memory that is allocated,
 336 the kernel can provide allocation limits.
 337 One reason given for having separate allocators is that
 338 no single allocator could starve the rest of the kernel of all
 339 its available memory and thus a single runaway
 340 client could not paralyze the system.
 341 By putting limits on each type of memory,
 342 the single general purpose memory allocator can provide the same
 343 protection against memory starvation.\(dg
 344 .FS
 345 \(dgOne might seriously ask the question what good it is if ``only''
 346 one subsystem within the kernel hangs if it is something like the
 347 network on a diskless workstation.
 348 .FE
 349 .PP
 350 \*(Lb shows the memory usage of the kernel over a one day period
 351 on a general timesharing machine at Berkeley.
 352 The ``In Use'', ``Free'', and ``Mem Use'' fields are instantaneous values;
 353 the ``Requests'' field is the number of allocations since system startup;
 354 the ``High Use'' field is the maximum value of
 355 the ``Mem Use'' field since system startup.
 356 The figure demonstrates that most
 357 allocations are for small objects.
 358 Large allocations occur infrequently,
 359 and are typically for long-lived objects
 360 such as buffers to hold the superblock for
 361 a mounted file system.
 362 Thus, a memory allocator only needs to be
 363 fast for small pieces of memory.
 364 .H 1 "Implementation of the Kernel Memory Allocator
 365 .PP
 366 In reviewing the available memory allocators,
 367 none of their strategies could be used without some modification.
 368 The kernel memory allocator that we ended up with is a hybrid
 369 of the fast memory allocator found in the 4.2BSD C library
 370 and a slower but more-memory-efficient first-fit allocator.
 371 .PP
 372 Small allocations are done using the 4.2BSD power-of-two list strategy;
 373 the typical allocation requires only a computation of
 374 the list to use and the removal of an element if it is available,
 375 so it is quite fast.
 376 Macros are provided to avoid the cost of a subroutine call.
 377 Only if the request cannot be fulfilled from a list is a call
 378 made to the allocator itself.
 379 To ensure that the allocator is always called for large requests,
 380 the lists corresponding to large allocations are always empty.
 381 Appendix A shows the data structures and implementation of the macros.
 382 .PP
 383 Similarly, freeing a block of memory can be done with a macro.
 384 The macro computes the list on which to place the request
 385 and puts it there.
 386 The free routine is called only if the block of memory is
 387 considered to be a large allocation.
 388 Including the cost of blocking out interrupts,
 389 the allocation and freeing macros generate respectively
 390 only nine and sixteen (simple) VAX instructions.
 391 .PP
 392 Because of the inefficiency of power-of-two allocation strategies
 393 for large allocations,
 394 a different strategy is used for allocations larger than two kilobytes.
 395 The selection of two kilobytes is derived from our statistics on
 396 the utilization of memory within the kernel,
 397 that showed that 95 to 98% of allocations are of size one kilobyte or less.
 398 A frequent caller of the memory allocator
 399 (the name translation function)
 400 always requests a one kilobyte block.
 401 Additionally the allocation method for large blocks is based on allocating
 402 pieces of memory in multiples of pages.
 403 Consequently the actual allocation size for requests of size
 404 $2~times~pagesize$ or less are identical.\(dg
 405 .FS
 406 \(dgTo understand why this number is $size 8 {2~times~pagesize}$ one
 407 observes that the power-of-two algorithm yields sizes of 1, 2, 4, 8, \&...
 408 pages while the large block algorithm that allocates in multiples
 409 of pages yields sizes of 1, 2, 3, 4, \&... pages.
 410 Thus for allocations of sizes between one and two pages
 411 both algorithms use two pages;
 412 it is not until allocations of sizes between two and three pages
 413 that a difference emerges where the power-of-two algorithm will use
 414 four pages while the large block algorithm will use three pages.
 415 .FE
 416 In 4.3BSD on the VAX, the (software) page size is one kilobyte,
 417 so two kilobytes is the smallest logical cutoff.
 418 .PP
 419 Large allocations are first rounded up to be a multiple of the page size.
 420 The allocator then uses a first-fit algorithm to find space in the
 421 kernel address arena set aside for dynamic allocations.
 422 Thus a request for a five kilobyte piece of memory will use exactly
 423 five pages of memory rather than eight kilobytes as with
 424 the power-of-two allocation strategy.
 425 When a large piece of memory is freed,
 426 the memory pages are returned to the free memory pool,
 427 and the address space is returned to the kernel address arena
 428 where it is coalesced with adjacent free pieces.
 429 .PP
 430 Another technique to improve both the efficiency of memory utilization
 431 and the speed of allocation
 432 is to cluster same-sized small allocations on a page.
 433 When a list for a power-of-two allocation is empty,
 434 a new page is allocated and divided into pieces of the needed size.
 435 This strategy speeds future allocations as several pieces of memory
 436 become available as a result of the call into the allocator.
 437 .PP
 438 .FI "Calculation of allocation size"
 439 .so alloc.fig
 440 .Fe
 441 Because the size is not specified when a block of memory is freed,
 442 the allocator must keep track of the sizes of the pieces it has handed out.
 443 The 4.2BSD user-level allocator stores the size of each block
 444 in a header just before the allocation.
 445 However, this strategy doubles the memory requirement for allocations that
 446 require a power-of-two-sized block.
 447 Therefore,
 448 instead of storing the size of each piece of memory with the piece itself,
 449 the size information is associated with the memory page.
 450 \*(Lb shows how the kernel determines
 451 the size of a piece of memory that is being freed,
 452 by calculating the page in which it resides,
 453 and looking up the size associated with that page.
 454 Eliminating the cost of the overhead per piece improved utilization
 455 far more than expected.
 456 The reason is that many allocations in the kernel are for blocks of
 457 memory whose size is exactly a power of two.
 458 These requests would be nearly doubled if the user-level strategy were used.
 459 Now they can be accommodated with no wasted memory.
 460 .PP
 461 The allocator can be called both from the top half of the kernel,
 462 which is willing to wait for memory to become available,
 463 and from the interrupt routines in the bottom half of the kernel
 464 that cannot wait for memory to become available.
 465 Clients indicate their willingness (and ability) to wait with a flag
 466 to the allocation routine.
 467 For clients that are willing to wait,
 468 the allocator guarrentees that their request will succeed.
 469 Thus, these clients can need not check the return value from the allocator.
 470 If memory is unavailable and the client cannot wait,
 471 the allocator returns a null pointer.
 472 These clients must be prepared to cope with this
 473 (hopefully infrequent) condition
 474 (usually by giving up and hoping to do better later).
 475 .H 1 "Results of the Implementation
 476 .PP
 477 The new memory allocator was written about a year ago.
 478 Conversion from the old memory allocators to the new allocator
 479 has been going on ever since.
 480 Many of the special purpose allocators have been eliminated.
 481 This list includes
 482 .RN calloc ,
 483 .RN wmemall ,
 484 and
 485 .RN zmemall .
 486 Many of the special purpose memory allocators built on
 487 top of other allocators have also been eliminated.
 488 For example, the allocator that was built on top of the buffer pool allocator
 489 .RN geteblk
 490 to allocate pathname buffers in
 491 .RN namei
 492 has been eliminated.
 493 Because the typical allocation is so fast,
 494 we have found that none of the special purpose pools are needed.
 495 Indeed, the allocation is about the same as the previous cost of
 496 allocating buffers from the network pool (\fImbuf\fP\^s).
 497 Consequently applications that used to allocate network
 498 buffers for their own uses have been switched over to using
 499 the general purpose allocator without increasing their running time.
 500 .PP
 501 Quantifying the performance of the allocator is difficult because
 502 it is hard to measure the amount of time spent allocating
 503 and freeing memory in the kernel.
 504 The usual approach is to compile a kernel for profiling
 505 and then compare the running time of the routines that
 506 implemented the old abstraction versus those that implement the new one.
 507 The old routines are difficult to quantify because
 508 individual routines were used for more than one purpose.
 509 For example, the
 510 .RN geteblk
 511 routine was used both to allocate one kilobyte memory blocks
 512 and for its intended purpose of providing buffers to the filesystem.
 513 Differentiating these uses is often difficult.
 514 To get a measure of the cost of memory allocation before
 515 putting in our new allocator,
 516 we summed up the running time of all the routines whose
 517 exclusive task was memory allocation.
 518 To this total we added the fraction
 519 of the running time of the multi-purpose routines that could
 520 clearly be identified as memory allocation usage.
 521 This number showed that approximately three percent of
 522 the time spent in the kernel could be accounted to memory allocation.
 523 .PP
 524 The new allocator is difficult to measure
 525 because the usual case of the memory allocator is implemented as a macro.
 526 Thus, its running time is a small fraction of the running time of the
 527 numerous routines in the kernel that use it.
 528 To get a bound on the cost,
 529 we changed the macro always to call the memory allocation routine.
 530 Running in this mode, the memory allocator accounted for six percent
 531 of the time spent in the kernel.
 532 Factoring out the cost of the statistics collection and the
 533 subroutine call overhead for the cases that could
 534 normally be handled by the macro,
 535 we estimate that the allocator would account for
 536 at most four percent of time in the kernel.
 537 These measurements show that the new allocator does not introduce
 538 significant new run-time costs.
 539 .PP
 540 The other major success has been in keeping the size information
 541 on a per-page basis.
 542 This technique allows the most frequently requested sizes to be
 543 allocated without waste.
 544 It also reduces the amount of bookkeeping information associated
 545 with the allocator to four kilobytes of information
 546 per megabyte of memory under management (with a one kilobyte page size).
 547 .H 1 "Future Work
 548 .PP
 549 Our next project is to convert many of the static
 550 kernel tables to be dynamically allocated.
 551 Static tables include the process table, the file table,
 552 and the mount table.
 553 Making these tables dynamic will have two benefits.
 554 First, it will reduce the amount of memory
 555 that must be statically allocated at boot time.
 556 Second, it will eliminate the arbitrary upper limit imposed
 557 by the current static sizing
 558 (although a limit will be retained to constrain runaway clients).
 559 Other researchers have already shown the memory savings
 560 achieved by this conversion [Rodriguez88].
 561 .PP
 562 Under the current implementation,
 563 memory is never moved from one size list to another.
 564 With the 4.2BSD memory allocator this causes problems,
 565 particularly for large allocations where a process may use
 566 a quarter megabyte piece of memory once,
 567 which is then never available for any other size request.
 568 In our hybrid scheme,
 569 memory can be shuffled between large requests so that large blocks
 570 of memory are never stranded as they are with the 4.2BSD allocator.
 571 However, pages allocated to small requests are allocated once
 572 to a particular size and never changed thereafter.
 573 If a burst of requests came in for a particular size,
 574 that size would acquire a large amount of memory
 575 that would then not be available for other future requests.
 576 .PP
 577 In practice, we do not find that the free lists become too large.
 578 However, we have been investigating ways to handle such problems
 579 if they occur in the future.
 580 Our current investigations involve a routine
 581 that can run as part of the idle loop that would sort the elements
 582 on each of the free lists into order of increasing address.
 583 Since any given page has only one size of elements allocated from it,
 584 the effect of the sorting would be to sort the list into distinct pages.
 585 When all the pieces of a page became free,
 586 the page itself could be released back to the free pool so that
 587 it could be allocated to another purpose.
 588 Although there is no guarantee that all the pieces of a page would ever
 589 be freed,
 590 most allocations are short-lived, lasting only for the duration of
 591 an open file descriptor, an open network connection, or a system call.
 592 As new allocations would be made from the page sorted to
 593 the front of the list,
 594 return of elements from pages at the back would eventually
 595 allow pages later in the list to be freed.
 596 .PP
 597 Two of the traditional UNIX
 598 memory allocators remain in the current system.
 599 The terminal subsystem uses \fIclist\fP\^s (character lists).
 600 That part of the system is expected to undergo major revision within
 601 the next year or so, and it will probably be changed to use
 602 \fImbuf\fP\^s as it is merged into the network system.
 603 The other major allocator that remains is
 604 .RN getblk ,
 605 the routine that manages the filesystem buffer pool memory
 606 and associated control information.
 607 Only the filesystem uses
 608 .RN getblk
 609 in the current system;
 610 it manages the constant-sized buffer pool.
 611 We plan to merge the filesystem buffer cache into the virtual memory system's
 612 page cache in the future.
 613 This change will allow the size of the buffer pool to be changed
 614 according to memory load,
 615 but will require a policy for balancing memory needs
 616 with filesystem cache performance.
 617 .H 1 "Acknowledgments
 618 .PP
 619 In the spirit of community support,
 620 we have made various versions of our allocator available to our test sites.
 621 They have been busily burning it in and giving
 622 us feedback on their experiences.
 623 We acknowledge their invaluable input.
 624 The feedback from the Usenix program committee on the initial draft of
 625 our paper suggested numerous important improvements.
 626 .H 1 "References
 627 .LP
 628 .IP Korn85 \w'Rodriguez88\0\0'u
 629 David Korn, Kiem-Phong Vo,
 630 ``In Search of a Better Malloc''
 631 \fIProceedings of the Portland Usenix Conference\fP,
 632 pp 489-506, June 1985.
 633 .IP McKusick85
 634 M. McKusick, M. Karels, S. Leffler,
 635 ``Performance Improvements and Functional Enhancements in 4.3BSD''
 636 \fIProceedings of the Portland Usenix Conference\fP,
 637 pp 519-531, June 1985.
 638 .IP Rodriguez88
 639 Robert Rodriguez, Matt Koehler, Larry Palmer, Ricky Palmer,
 640 ``A Dynamic UNIX Operating System''
 641 \fIProceedings of the San Francisco Usenix Conference\fP,
 642 June 1988.
 643 .IP Thompson78
 644 Ken Thompson,
 645 ``UNIX Implementation''
 646 \fIBell System Technical Journal\fP, volume 57, number 6,
 647 pp 1931-1946, 1978.