5 An overview of the concepts and the related functions in the Linux kernel
7 Patrick Mochel <mochel@transmeta.com>
9 ---------------------------------------------------------------------------
12 2. How the PCI Subsystem Does Power Management
13 3. PCI Utility Functions
20 The PCI Power Management Specification was introduced between the PCI 2.1 and
21 PCI 2.2 Specifications. It a standard interface for controlling various
22 power management operations.
24 Implementation of the PCI PM Spec is optional, as are several sub-components of
25 it. If a device supports the PCI PM Spec, the device will have an 8 byte
26 capability field in its PCI configuration space. This field is used to describe
27 and control the standard PCI power management features.
29 The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
30 (B0 - B3). The higher the number, the less power the device consumes. However,
31 the higher the number, the longer the latency is for the device to return to
32 an operational state (D0).
34 Bus power management is not covered in this version of this document.
36 Note that all PCI devices support D0 and D3 by default, regardless of whether or
37 not they implement any of the PCI PM spec.
39 The possible state transitions that a device can undergo are:
41 +---------------------------+
42 | Current State | New State |
43 +---------------------------+
45 +---------------------------+
47 +---------------------------+
49 +---------------------------+
51 +---------------------------+
53 Note that when the system is entering a global suspend state, all devices will
54 be placed into D3 and when resuming, all devices will be placed into D0.
55 However, when the system is running, other state transitions are possible.
57 2. How The PCI Subsystem Handles Power Management
58 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 The PCI suspend/resume functionality is accessed indirectly via the Power
61 Management subsystem. At boot, the PCI driver registers a power management
62 callback with that layer. Upon entering a suspend state, the PM layer iterates
63 through all of its registered callbacks. This currently takes place only during
64 APM state transitions.
66 Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
67 it does a depth first walk of the device tree. The first walk saves each of the
68 device's state and checks for devices that will prevent the system from entering
69 a global power state. The next walk then places the devices in a low power
72 The first walk allows a graceful recovery in the event of a failure, since none
73 of the devices have actually been powered down.
75 In both walks, in particular the second, all children of a bridge are touched
76 before the actual bridge itself. This allows the bridge to retain power while
77 its children are being accessed.
79 Upon resuming from sleep, just the opposite must be true: all bridges must be
80 powered on and restored before their children are powered on. This is easily
81 accomplished with a breadth-first walk of the PCI device tree.
84 3. PCI Utility Functions
85 ~~~~~~~~~~~~~~~~~~~~~~~~
87 These are helper functions designed to be called by individual device drivers.
88 Assuming that a device behaves as advertised, these should be applicable in most
89 cases. However, results may vary.
91 Note that these functions are never implicitly called for the driver. The driver
92 is always responsible for deciding when and if to call these.
99 pci_save_state(dev, buffer);
102 Save first 64 bytes of PCI config space. Buffer must be allocated by
110 pci_restore_state(dev, buffer);
113 Restore previously saved config space. (First 64 bytes only);
115 If buffer is NULL, then restore what information we know about the
116 device from bootup: BARs and interrupt line.
123 pci_set_power_state(dev, state);
126 Transition device to low power state using PCI PM Capabilities
129 Will fail under one of the following conditions:
130 - If state is less than current state, but not D0 (illegal transition)
131 - Device doesn't support PM Capabilities
132 - Device does not support requested state
139 pci_enable_wake(dev, state, enable);
142 Enable device to generate PME# during low power state using PCI PM
145 Checks whether if device supports generating PME# from requested state
146 and fail if it does not, unless enable == 0 (request is to disable wake
147 events, which is implicit if it doesn't even support it in the first
150 Note that the PMC Register in the device's PM Capabilties has a bitmask
151 of the states it supports generating PME# from. D3hot is bit 3 and
152 D3cold is bit 4. So, while a value of 4 as the state may not seem
153 semantically correct, it is.
156 4. PCI Device Drivers
157 ~~~~~~~~~~~~~~~~~~~~~
159 These functions are intended for use by individual drivers, and are defined in
162 int (*save_state) (struct pci_dev *dev, u32 state);
163 int (*suspend) (struct pci_dev *dev, u32 state);
164 int (*resume) (struct pci_dev *dev);
165 int (*enable_wake) (struct pci_dev *dev, u32 state, int enable);
173 if (dev->driver && dev->driver->save_state)
174 dev->driver->save_state(dev,state);
176 The driver should use this callback to save device state. It should take into
177 account the current state of the device and the requested state in order to
178 avoid any unnecessary operations.
180 For example, a video card that supports all 4 states (D0-D3), all controller
181 context is preserved when entering D1, but the screen is placed into a low power
184 The driver can also interpret this function as a notification that it may be
185 entering a sleep state in the near future. If it knows that the device cannot
186 enter the requested state, either because of lack of support for it, or because
187 the device is middle of some critical operation, then it should fail.
189 This function should not be used to set any state in the device or the driver
190 because the device may not actually enter the sleep state (e.g. another driver
191 later causes causes a global state transition to fail).
193 Note that in intermediate low power states, a device's I/O and memory spaces may
194 be disabled and may not be available in subsequent transitions to lower power
203 if (dev->driver && dev->driver->suspend)
204 dev->driver->suspend(dev,state);
206 A driver uses this function to actually transition the device into a low power
207 state. This may include disabling I/O, memory and bus-mastering, as well as
208 physically transitioning the device to a lower power state.
210 Bus mastering may be disabled by doing:
212 pci_disable_device(dev);
214 For devices that support the PCI PM Spec, this may be used to set the device's
217 pci_set_power_state(dev,state);
219 The driver is also responsible for disabling any other device-specific features
220 (e.g blanking screen, turning off on-card memory, etc).
222 The driver should be sure to track the current state of the device, as it may
223 obviate the need for some operations.
225 The driver should update the current_state field in its pci_dev structure in
233 if (dev->driver && dev->driver->suspend)
234 dev->driver->resume(dev)
236 The resume callback may be called from any power state, and is always meant to
237 transition the device to the D0 state.
239 The driver is responsible for reenabling any features of the device that had
240 been disabled during previous suspend calls and restoring all state that was
241 saved in previous save_state calls.
243 If the device is currently in D3, it must be completely reinitialized, as it
244 must be assumed that the device has lost all of its context (even that of its
245 PCI config space). For almost all current drivers, this means that the
246 initialization code that the driver does at boot must be separated out and
247 called again from the resume callback. Note that some values for the device may
248 not have to be probed for this time around if they are saved before entering the
251 If the device supports the PCI PM Spec, it can use this to physically transition
254 pci_set_power_state(dev,0);
256 Note that if the entire system is transitioning out of a global sleep state, all
257 devices will be placed in the D0 state, so this is not necessary. However, in
258 the event that the device is placed in the D3 state during normal operation,
259 this call is necessary. It is impossible to determine which of the two events is
260 taking place in the driver, so it is always a good idea to make that call.
262 The driver should take note of the state that it is resuming from in order to
263 ensure correct (and speedy) operation.
265 The driver should update the current_state field in its pci_dev structure in
274 if (dev->driver && dev->driver->enable_wake)
275 dev->driver->enable_wake(dev,state,enable);
277 This callback is generally only relevant for devices that support the PCI PM
278 spec and have the ability to generate a PME# (Power Management Event Signal)
279 to wake the system up. (However, it is possible that a device may support
280 some non-standard way of generating a wake event on sleep.)
282 Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
283 PM Capabilties describe what power states the device supports generating a
296 A device can use this to enable wake events:
298 pci_enable_wake(dev,state,enable);
300 Note that to enable PME# from D3cold, a value of 4 should be passed to
301 pci_enable_wake (since it uses an index into a bitmask). If a driver gets
302 a request to enable wake events from D3, two calls should be made to
303 pci_enable_wake (one for both D3hot and D3cold).
309 PCI Local Bus Specification
310 PCI Bus Power Management Interface Specification