Documentation/gpu/amdgpu/display/dcn-overview.rst

   1 .. _dcn_overview:
   2
   3 =======================
   4 Display Core Next (DCN)
   5 =======================
   6
   7 To equip our readers with the basic knowledge of how AMD Display Core Next
   8 (DCN) works, we need to start with an overview of the hardware pipeline. Below
   9 you can see a picture that provides a DCN overview, keep in mind that this is a
  10 generic diagram, and we have variations per ASIC.
  11
  12 .. kernel-figure:: dc_pipeline_overview.svg
  13
  14 Based on this diagram, we can pass through each block and briefly describe
  15 them:
  16
  17 * **Display Controller Hub (DCHUB)**: This is the gateway between the Scalable
  18   Data Port (SDP) and DCN. This component has multiple features, such as memory
  19   arbitration, rotation, and cursor manipulation.
  20
  21 * **Display Pipe and Plane (DPP)**: This block provides pre-blend pixel
  22   processing such as color space conversion, linearization of pixel data, tone
  23   mapping, and gamut mapping.
  24
  25 * **Multiple Pipe/Plane Combined (MPC)**: This component performs blending of
  26   multiple planes, using global or per-pixel alpha.
  27
  28 * **Output Pixel Processing (OPP)**: Process and format pixels to be sent to
  29   the display.
  30
  31 * **Output Pipe Timing Combiner (OPTC)**: It generates time output to combine
  32   streams or divide capabilities. CRC values are generated in this block.
  33
  34 * **Display Output (DIO)**: Codify the output to the display connected to our
  35   GPU.
  36
  37 * **Display Writeback (DWB)**: It provides the ability to write the output of
  38   the display pipe back to memory as video frames.
  39
  40 * **Multi-Media HUB (MMHUBBUB)**: Memory controller interface for DMCUB and DWB
  41   (Note that DWB is not hooked yet).
  42
  43 * **DCN Management Unit (DMU)**: It provides registers with access control and
  44   interrupts the controller to the SOC host interrupt unit. This block includes
  45   the Display Micro-Controller Unit - version B (DMCUB), which is handled via
  46   firmware.
  47
  48 * **DCN Clock Generator Block (DCCG)**: It provides the clocks and resets
  49   for all of the display controller clock domains.
  50
  51 * **Azalia (AZ)**: Audio engine.
  52
  53 The above diagram is an architecture generalization of DCN, which means that
  54 every ASIC has variations around this base model. Notice that the display
  55 pipeline is connected to the Scalable Data Port (SDP) via DCHUB; you can see
  56 the SDP as the element from our Data Fabric that feeds the display pipe.
  57
  58 Always approach the DCN architecture as something flexible that can be
  59 configured and reconfigured in multiple ways; in other words, each block can be
  60 setup or ignored accordingly with userspace demands. For example, if we
  61 want to drive an 8k@60Hz with a DSC enabled, our DCN may require 4 DPP and 2
  62 OPP. It is DC's responsibility to drive the best configuration for each
  63 specific scenario. Orchestrate all of these components together requires a
  64 sophisticated communication interface which is highlighted in the diagram by
  65 the edges that connect each block; from the chart, each connection between
  66 these blocks represents:
  67
  68 1. Pixel data interface (red): Represents the pixel data flow;
  69 2. Global sync signals (green): It is a set of synchronization signals composed
  70    by VStartup, VUpdate, and VReady;
  71 3. Config interface: Responsible to configure blocks;
  72 4. Sideband signals: All other signals that do not fit the previous one.
  73
  74 These signals are essential and play an important role in DCN. Nevertheless,
  75 the Global Sync deserves an extra level of detail described in the next
  76 section.
  77
  78 All of these components are represented by a data structure named dc_state.
  79 From DCHUB to MPC, we have a representation called dc_plane; from MPC to OPTC,
  80 we have dc_stream, and the output (DIO) is handled by dc_link. Keep in mind
  81 that HUBP accesses a surface using a specific format read from memory, and our
  82 dc_plane should work to convert all pixels in the plane to something that can
  83 be sent to the display via dc_stream and dc_link.
  84
  85 Front End and Back End
  86 ----------------------
  87
  88 Display pipeline can be broken down into two components that are usually
  89 referred as **Front End (FE)** and **Back End (BE)**, where FE consists of:
  90
  91 * DCHUB (Mainly referring to a subcomponent named HUBP)
  92 * DPP
  93 * MPC
  94
  95 On the other hand, BE consist of
  96
  97 * OPP
  98 * OPTC
  99 * DIO (DP/HDMI stream encoder and link encoder)
 100
 101 OPP and OPTC are two joining blocks between FE and BE. On a side note, this is
 102 a one-to-one mapping of the link encoder to PHY, but we can configure the DCN
 103 to choose which link encoder to connect to which PHY. FE's main responsibility
 104 is to change, blend and compose pixel data, while BE's job is to frame a
 105 generic pixel stream to a specific display's pixel stream.
 106
 107 Data Flow
 108 ---------
 109
 110 Initially, data is passed in from VRAM through Data Fabric (DF) in native pixel
 111 formats. Such data format stays through till HUBP in DCHUB, where HUBP unpacks
 112 different pixel formats and outputs them to DPP in uniform streams through 4
 113 channels (1 for alpha + 3 for colors).
 114
 115 The Converter and Cursor (CNVC) in DPP would then normalize the data
 116 representation and convert them to a DCN specific floating-point format (i.e.,
 117 different from the IEEE floating-point format). In the process, CNVC also
 118 applies a degamma function to transform the data from non-linear to linear
 119 space to relax the floating-point calculations following. Data would stay in
 120 this floating-point format from DPP to OPP.
 121
 122 Starting OPP, because color transformation and blending have been completed
 123 (i.e alpha can be dropped), and the end sinks do not require the precision and
 124 dynamic range that floating points provide (i.e. all displays are in integer
 125 depth format), bit-depth reduction/dithering would kick in. In OPP, we would
 126 also apply a regamma function to introduce the gamma removed earlier back.
 127 Eventually, we output data in integer format at DIO.
 128
 129 AMD Hardware Pipeline
 130 ---------------------
 131
 132 When discussing graphics on Linux, the **pipeline** term can sometimes be
 133 overloaded with multiple meanings, so it is important to define what we mean
 134 when we say **pipeline**. In the DCN driver, we use the term **hardware
 135 pipeline** or **pipeline** or just **pipe** as an abstraction to indicate a
 136 sequence of DCN blocks instantiated to address some specific configuration. DC
 137 core treats DCN blocks as individual resources, meaning we can build a pipeline
 138 by taking resources for all individual hardware blocks to compose one pipeline.
 139 In actuality, we can't connect an arbitrary block from one pipe to a block from
 140 another pipe; they are routed linearly, except for DSC, which can be
 141 arbitrarily assigned as needed. We have this pipeline concept for trying to
 142 optimize bandwidth utilization.
 143
 144 .. kernel-figure:: pipeline_4k_no_split.svg
 145
 146 Additionally, let's take a look at parts of the DTN log (see
 147 'Documentation/gpu/amdgpu/display/dc-debug.rst' for more information) since
 148 this log can help us to see part of this pipeline behavior in real-time::
 149
 150  HUBP:  format  addr_hi  width  height ...
 151  [ 0]:      8h      81h   3840    2160
 152  [ 1]:      0h       0h      0       0
 153  [ 2]:      0h       0h      0       0
 154  [ 3]:      0h       0h      0       0
 155  [ 4]:      0h       0h      0       0
 156  ...
 157  MPCC:  OPP  DPP ...
 158  [ 0]:   0h   0h ...
 159
 160 The first thing to notice from the diagram and DTN log it is the fact that we
 161 have different clock domains for each part of the DCN blocks. In this example,
 162 we have just a single **pipeline** where the data flows from DCHUB to DIO, as
 163 we intuitively expect. Nonetheless, DCN is flexible, as mentioned before, and
 164 we can split this single pipe differently, as described in the below diagram:
 165
 166 .. kernel-figure:: pipeline_4k_split.svg
 167
 168 Now, if we inspect the DTN log again we can see some interesting changes::
 169
 170  HUBP:  format  addr_hi  width  height ...
 171  [ 0]:      8h      81h   1920    2160 ...
 172  ...
 173  [ 4]:      0h       0h      0       0 ...
 174  [ 5]:      8h      81h   1920    2160 ...
 175  ...
 176  MPCC:  OPP  DPP ...
 177  [ 0]:   0h   0h ...
 178  [ 5]:   0h   5h ...
 179
 180 From the above example, we now split the display pipeline into two vertical
 181 parts of 1920x2160 (i.e., 3440x2160), and as a result, we could reduce the
 182 clock frequency in the DPP part. This is not only useful for saving power but
 183 also to better handle the required throughput. The idea to keep in mind here is
 184 that the pipe configuration can vary a lot according to the display
 185 configuration, and it is the DML's responsibility to set up all required
 186 configuration parameters for multiple scenarios supported by our hardware.
 187
 188 Global Sync
 189 -----------
 190
 191 Many DCN registers are double buffered, most importantly the surface address.
 192 This allows us to update DCN hardware atomically for page flips, as well as
 193 for most other updates that don't require enabling or disabling of new pipes.
 194
 195 (Note: There are many scenarios when DC will decide to reserve extra pipes
 196 in order to support outputs that need a very high pixel clock, or for
 197 power saving purposes.)
 198
 199 These atomic register updates are driven by global sync signals in DCN. In
 200 order to understand how atomic updates interact with DCN hardware, and how DCN
 201 signals page flip and vblank events it is helpful to understand how global sync
 202 is programmed.
 203
 204 Global sync consists of three signals, VSTARTUP, VUPDATE, and VREADY. These are
 205 calculated by the Display Mode Library - DML (drivers/gpu/drm/amd/display/dc/dml)
 206 based on a large number of parameters and ensure our hardware is able to feed
 207 the DCN pipeline without underflows or hangs in any given system configuration.
 208 The global sync signals always happen during VBlank, are independent from the
 209 VSync signal, and do not overlap each other.
 210
 211 VUPDATE is the only signal that is of interest to the rest of the driver stack
 212 or userspace clients as it signals the point at which hardware latches to
 213 atomically programmed (i.e. double buffered) registers. Even though it is
 214 independent of the VSync signal we use VUPDATE to signal the VSync event as it
 215 provides the best indication of how atomic commits and hardware interact.
 216
 217 Since DCN hardware is double-buffered the DC driver is able to program the
 218 hardware at any point during the frame.
 219
 220 The below picture illustrates the global sync signals:
 221
 222 .. kernel-figure:: global_sync_vblank.svg
 223
 224 These signals affect core DCN behavior. Programming them incorrectly will lead
 225 to a number of negative consequences, most of them quite catastrophic.
 226
 227 The following picture shows how global sync allows for a mailbox style of
 228 updates, i.e. it allows for multiple re-configurations between VUpdate
 229 events where only the last configuration programmed before the VUpdate signal
 230 becomes effective.
 231
 232 .. kernel-figure:: config_example.svg