3 .\" Copyright (c) 1998 Jason R. Thorpe.
4 .\" All rights reserved.
6 .\" Redistribution and use in source and binary forms, with or without
7 .\" modification, are permitted provided that the following conditions
9 .\" 1. Redistributions of source code must retain the above copyright
10 .\" notice, this list of conditions and the following disclaimer.
11 .\" 2. Redistributions in binary form must reproduce the above copyright
12 .\" notice, this list of conditions and the following disclaimer in the
13 .\" documentation and/or other materials provided with the distribution.
14 .\" 3. All advertising materials mentioning features or use of this software
15 .\" must display the following acknowledgements:
16 .\" This product includes software developed for the NetBSD Project
17 .\" by Jason R. Thorpe.
18 .\" 4. The name of the author may not be used to endorse or promote products
19 .\" derived from this software without specific prior written permission.
21 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
22 .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
23 .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
24 .\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
25 .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
26 .\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
27 .\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
28 .\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29 .\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35 NetBSD is a portable, modern UNIX-like operating system which currently
36 runs on eighteen platforms covering nine processor architectures. Some
37 of these platforms, including the Alpha and i386\**, share the PCI bus
39 \**The term "i386" is used here to refer to all of the 386-class and higher
40 processors, including the i486, Pentium, Pentium Pro, and Pentium II.
42 as a common architectural feature.
43 In order to share device drivers for PCI devices between different
44 platforms, abstractions that hide the details of bus access must be
45 invented. The details that must be hidden can be broken down into
46 two classes: CPU access to devices on the bus (\fIbus_space\fR)
47 and device access to host memory (\fIbus_dma\fR). Here we will discuss
48 the latter; \fIbus_space\fR is a complicated topic in and of itself, and
49 is beyond the scope of this paper.
51 Within the scope of DMA, there are two broad classes of details
52 that must be hidden from the core device driver.
53 The first class, host details, deals with issues such as
54 the physical mapping of system memory (and the DMA mechanisms employed
55 as a result of such mapping) and cache semantics. The second
56 class, bus details, deals with issues related to features or
57 limitations specific to the bus to which a device is attached, such
58 as DMA bursting and address line limitations.
59 .sh 2 "Host platform details"
61 In the example platforms listed above, there are at least three different
62 mechanisms used to perform DMA. The first is used by the i386 platform.
63 This mechanism can be described as "what you see is what you get":
64 the address that the device uses to perform the DMA transfer is the same
65 address that the host CPU uses to access the memory location in question.
69 employed by the Alpha, is very similar to the first; the address
70 the host CPU uses to access the memory location in question is offset from
71 some base address at which host memory is direct-mapped on the device bus
72 for the purpose of DMA.
75 The third mechanism, scatter-gather-mapped DMA, employs an MMU which performs
76 translation of DMA addresses to host memory physical addresses. This
77 mechanism is also used by the Alpha, because Alpha platforms implement a
78 physical address space sometimes significantly larger than the 32-bit
79 address space supported by most currently-available PCI devices.
82 The second and third DMA mechanisms above are combined on the Alpha through
83 the use of \fIDMA windows\fR. The ASIC which implements the PCI bus
84 on a particular platform has at least two of these DMA windows. Each
85 window may be configured for direct-mapped or scatter-gather-mapped
86 DMA. Windows are chosen based on the type of DMA transfer being performed,
87 the bus type, and the physical address range of the host memory being
90 These concepts apply to platforms other than those listed above
91 and busses other than PCI. Similar issues exist with the TurboChannel bus
92 used on DECstations and early Alpha systems, and with the Q-bus used on
93 some DEC MIPS and VAX-based servers.
95 The semantics of the host system's cache are also important to devices
96 which wish to perform DMA. Some systems are capable of cache-coherent
97 DMA. On such systems, the cache is often write-through (i.e. stores are
98 written both to the cache and to host memory), or the cache has special
99 snooping logic that can detect access to a memory location for which there
100 is a dirty cache line (which causes the cache to be flushed automatically).
101 Other systems are not capable of cache-coherent DMA. On these systems,
102 software must explicitly flush any data caches before memory-to-device
103 DMA transfers, as well as invalidate soon-to-be-stale cache lines before
104 device-to-memory DMA.
107 In addition to hiding the platform-specific DMA details for a single bus,
108 it is desirable to share as much device driver code as possible for
109 a device which may attach to multiple busses. A good example is the
110 BusLogic family of SCSI adapters. This family of devices comes in ISA,
111 EISA, VESA local bus, and PCI flavors. While there are some bus-specific
112 details, such as probing and interrupt initialization, the vast majority
113 of the code that drives this family of devices is identical for each flavor.
115 The BusLogic family of SCSI adapters are examples of what are termed
116 \fIbus masters\fR. That is to say, the device itself performs all bus
117 handshaking and host memory access during a DMA transfer. No third party
118 is involved in the transfer. Such devices, when performing a DMA transfer,
119 present the DMA address on the bus address lines, execute the bus's fetch
120 or store operation, increment the address, and so forth until the transfer
121 is complete. Because the device is using the bus address lines, the range
122 of host physical addresses the device can access is limited by the number
123 of such lines. On the PCI bus, which has at least 32 address lines, the
124 device may be able to access the entire physical address space of a 32-bit
125 architecture, such as the i386. ISA, however, only has 24 address lines.
126 This means that the device can directly access only 16MB of physical
129 A common solution to the limited-address-lines problem is a technique
130 known as \fIDMA bouncing\fR. This technique involves a second memory
131 area, located in the physical address range accessible by the device,
132 known as a \fIbounce buffer\fR. In a memory-to-device transfer, the
133 data is copied by the CPU to the bounce buffer, and the DMA operation is
134 started. Conversely, in a device-to-memory transfer, the DMA operation is
135 started, and the CPU then copies the data from the bounce buffer once the
136 DMA operation has completed.
138 While simple to implement, DMA bouncing is not the most elegant way to
139 solve the limited-address-line problem. On the Alpha, for example,
140 scatter-gather-mapped DMA may be used to translate the out-of-range
141 memory physical addresses to in-range DMA addresses that the device
142 may use. This solution tends to offer better performance due to
143 eliminated data copies, and is less expensive in terms of memory usage.
145 Returning to the BusLogic SCSI example, it is undesirable to place
146 intimate knowledge of direct-mapping, scatter-gather-mapping,
147 and DMA bouncing in the core device driver. Clearly, an abstraction that
148 hides these details and presents a consistent interface, regardless of
149 the DMA mechanism being used, is needed.