share/doc/papers/bus_dma/1.me

   1 .\"     $NetBSD$
   2 .\"
   3 .\" Copyright (c) 1998 Jason R. Thorpe.
   4 .\" All rights reserved.
   5 .\"
   6 .\" Redistribution and use in source and binary forms, with or without
   7 .\" modification, are permitted provided that the following conditions
   8 .\" are met:
   9 .\" 1. Redistributions of source code must retain the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer.
  11 .\" 2. Redistributions in binary form must reproduce the above copyright
  12 .\"    notice, this list of conditions and the following disclaimer in the
  13 .\"    documentation and/or other materials provided with the distribution.
  14 .\" 3. All advertising materials mentioning features or use of this software
  15 .\"    must display the following acknowledgements:
  16 .\"     This product includes software developed for the NetBSD Project
  17 .\"     by Jason R. Thorpe.
  18 .\" 4. The name of the author may not be used to endorse or promote products
  19 .\"    derived from this software without specific prior written permission.
  20 .\"
  21 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  22 .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  23 .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  24 .\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  25 .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  26 .\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  27 .\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  28 .\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  29 .\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  30 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  31 .\" SUCH DAMAGE.
  32 .\"
  33 .sh 1 "Introduction"
  34 .pp
  35 NetBSD is a portable, modern UNIX-like operating system which currently
  36 runs on eighteen platforms covering nine processor architectures.  Some
  37 of these platforms, including the Alpha and i386\**, share the PCI bus
  38 .(f
  39 \**The term "i386" is used here to refer to all of the 386-class and higher
  40 processors, including the i486, Pentium, Pentium Pro, and Pentium II.
  41 .)f
  42 as a common architectural feature.
  43 In order to share device drivers for PCI devices between different
  44 platforms, abstractions that hide the details of bus access must be
  45 invented.  The details that must be hidden can be broken down into
  46 two classes: CPU access to devices on the bus (\fIbus_space\fR)
  47 and device access to host memory (\fIbus_dma\fR).  Here we will discuss
  48 the latter; \fIbus_space\fR is a complicated topic in and of itself, and
  49 is beyond the scope of this paper.
  50 .pp
  51 Within the scope of DMA, there are two broad classes of details
  52 that must be hidden from the core device driver.
  53 The first class, host details, deals with issues such as
  54 the physical mapping of system memory (and the DMA mechanisms employed
  55 as a result of such mapping) and cache semantics.  The second
  56 class, bus details, deals with issues related to features or
  57 limitations specific to the bus to which a device is attached, such
  58 as DMA bursting and address line limitations.
  59 .sh 2 "Host platform details"
  60 .pp
  61 In the example platforms listed above, there are at least three different
  62 mechanisms used to perform DMA.  The first is used by the i386 platform.
  63 This mechanism can be described as "what you see is what you get":
  64 the address that the device uses to perform the DMA transfer is the same
  65 address that the host CPU uses to access the memory location in question.
  66 .so figure1.pic
  67 .pp
  68 The second mechanism,
  69 employed by the Alpha, is very similar to the first; the address
  70 the host CPU uses to access the memory location in question is offset from
  71 some base address at which host memory is direct-mapped on the device bus
  72 for the purpose of DMA.
  73 .so figure2.pic
  74 .pp
  75 The third mechanism, scatter-gather-mapped DMA, employs an MMU which performs
  76 translation of DMA addresses to host memory physical addresses.  This
  77 mechanism is also used by the Alpha, because Alpha platforms implement a
  78 physical address space sometimes significantly larger than the 32-bit
  79 address space supported by most currently-available PCI devices.
  80 .so figure3.pic
  81 .pp
  82 The second and third DMA mechanisms above are combined on the Alpha through
  83 the use of \fIDMA windows\fR.  The ASIC which implements the PCI bus
  84 on a particular platform has at least two of these DMA windows.  Each
  85 window may be configured for direct-mapped or scatter-gather-mapped
  86 DMA.  Windows are chosen based on the type of DMA transfer being performed,
  87 the bus type, and the physical address range of the host memory being
  88 accessed.
  89 .pp
  90 These concepts apply to platforms other than those listed above
  91 and busses other than PCI.  Similar issues exist with the TurboChannel bus
  92 used on DECstations and early Alpha systems, and with the Q-bus used on
  93 some DEC MIPS and VAX-based servers.
  94 .pp
  95 The semantics of the host system's cache are also important to devices
  96 which wish to perform DMA.  Some systems are capable of cache-coherent
  97 DMA.  On such systems, the cache is often write-through (i.e. stores are
  98 written both to the cache and to host memory), or the cache has special
  99 snooping logic that can detect access to a memory location for which there
 100 is a dirty cache line (which causes the cache to be flushed automatically).
 101 Other systems are not capable of cache-coherent DMA.  On these systems,
 102 software must explicitly flush any data caches before memory-to-device
 103 DMA transfers, as well as invalidate soon-to-be-stale cache lines before
 104 device-to-memory DMA.
 105 .sh 2 "Bus details"
 106 .pp
 107 In addition to hiding the platform-specific DMA details for a single bus,
 108 it is desirable to share as much device driver code as possible for
 109 a device which may attach to multiple busses.  A good example is the
 110 BusLogic family of SCSI adapters.  This family of devices comes in ISA,
 111 EISA, VESA local bus, and PCI flavors.  While there are some bus-specific
 112 details, such as probing and interrupt initialization, the vast majority
 113 of the code that drives this family of devices is identical for each flavor.
 114 .pp
 115 The BusLogic family of SCSI adapters are examples of what are termed
 116 \fIbus masters\fR.  That is to say, the device itself performs all bus
 117 handshaking and host memory access during a DMA transfer.  No third party
 118 is involved in the transfer.  Such devices, when performing a DMA transfer,
 119 present the DMA address on the bus address lines, execute the bus's fetch
 120 or store operation, increment the address, and so forth until the transfer
 121 is complete.  Because the device is using the bus address lines, the range
 122 of host physical addresses the device can access is limited by the number
 123 of such lines.  On the PCI bus, which has at least 32 address lines, the
 124 device may be able to access the entire physical address space of a 32-bit
 125 architecture, such as the i386.  ISA, however, only has 24 address lines.
 126 This means that the device can directly access only 16MB of physical
 127 address space.
 128 .pp
 129 A common solution to the limited-address-lines problem is a technique
 130 known as \fIDMA bouncing\fR.  This technique involves a second memory
 131 area, located in the physical address range accessible by the device,
 132 known as a \fIbounce buffer\fR.  In a memory-to-device transfer, the
 133 data is copied by the CPU to the bounce buffer, and the DMA operation is
 134 started.  Conversely, in a device-to-memory transfer, the DMA operation is
 135 started, and the CPU then copies the data from the bounce buffer once the
 136 DMA operation has completed.
 137 .pp
 138 While simple to implement, DMA bouncing is not the most elegant way to
 139 solve the limited-address-line problem.  On the Alpha, for example,
 140 scatter-gather-mapped DMA may be used to translate the out-of-range
 141 memory physical addresses to in-range DMA addresses that the device
 142 may use.  This solution tends to offer better performance due to
 143 eliminated data copies, and is less expensive in terms of memory usage.
 144 .pp
 145 Returning to the BusLogic SCSI example, it is undesirable to place
 146 intimate knowledge of direct-mapping, scatter-gather-mapping,
 147 and DMA bouncing in the core device driver.  Clearly, an abstraction that
 148 hides these details and presents a consistent interface, regardless of
 149 the DMA mechanism being used, is needed.