4 # The contents of this file are subject to the terms of the
5 # Common Development and Distribution License (the "License").
6 # You may not use this file except in compliance with the License.
8 # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9 # or http://www.opensolaris.org/os/licensing.
10 # See the License for the specific language governing permissions
11 # and limitations under the License.
13 # When distributing Covered Code, include this CDDL HEADER in each
14 # file and include the License file at usr/src/OPENSOLARIS.LICENSE.
15 # If applicable, add the following below this CDDL HEADER, with the
16 # fields enclosed by brackets "[]" replaced with your own identifying
17 # information: Portions Copyright [yyyy] [name of copyright owner]
22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved.
23 * Use is subject to license terms.
27 SOLARIS USB BANDWIDTH ANALYSIS
31 This document discuss the USB bandwidth allocation scheme, and the protocol
32 overheads used for both full and high speed host controller drivers. This
33 information is derived from the USB 2.0 specification, the "Bandwidth Analysis
34 Whitepaper" which is posted on www.usb.org, and other resources.
36 The target audience for this whitepaper are USB software & hardware designers
37 and engineers, and other interested people. The reader should be familiar with
38 the Universal Serial Bus Specification version 2.0, the OpenHCI Specification
39 1.0a and the Enhanced HCI Specification 1.0.
43 The following overheads, formulas and scheme are applicable both to full speed
44 host controllers and also to high speed hub Transaction Translators (TT),
45 which perform full/low speed transactions.
47 o Timing and data rate calculations
51 1 sec 1000 ms or 1000000000 ns
54 - Data rate calculations
56 1 ms 1500 bytes or 12000 bits (per frame)
57 668 ns 1 byte or 8 bits
59 1 full speed bit time 83.54 ns
61 o Protocol Overheads and Bandwidth numbers
65 (Refer 5.11.3 section of USB2.0 specification & page 2 of USB Bandwidth
68 Non Isochronous 9107 ns 14 bytes
69 Isochronous Output 6265 ns 10 bytes
70 Isochronous Input 7268 ns 11 bytes
71 Low-speed overhead 64060 ns 97 bytes
72 Hub LS overhead* 668 ns 1 byte
76 Host Delay* Specific to hardware 18 bytes
77 Low-Speed clock* Slower than Full speed 8
81 (Refer 7.3.5 section of OHCI specification 1.0a & page 2 of USB Bandwidth
84 Maximum bandwidth available 1500 bytes/frame
85 Maximum Non Periodic bandwidth 197 bytes/frame
86 Maximum Periodic bandwidth 1293 bytes/frame
90 1.Hub specific low speed overhead
92 The time provided by the Host Controller for hubs to enable Low Speed
93 ports. The minimum of 4 full speed bit time.
95 overhead = 2 x Hub_LS_Setup
96 = 2 x (4 x 83.54) = 668.32 Nano seconds = 1 byte.
98 2.Host delay will be specific to particular hardware. The following host
99 delay is for RIO USB OHCI host controller (Provided by Ken Ward - RIO
100 USB hardware person). The following is just an example how to calculate
101 "host delay" for given USB host controller implementation.
103 Ex: Assuming ED (Endpoint Descriptor)/TD's (Transfer Descriptor) are not
104 streaming in Schizo (PCI bridge) and no cache hits for an ED or TD:
106 To read an ED or TD or data:
108 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_RETRY
109 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_TRDY +
114 PCI_ARB_DELAY = 2000ns
118 DATA = 240ns (Always read 64 bytes ...)
119 Core Overhead =240 + 30 * (MPS/4) + 83.54 * (MPS/4) + 4 * 83.54
122 now multiply by 3 for ED+TD+DATA = 10200ns = ~128 bits or 16 bytes.
124 This is probably on the optimistic side, only using 2us for the
127 If there is a USB cache hit, the time it takes for an ED or TD is:
129 CORE SYNC DELAY + CACHE_HIT CHECK + 30 * (MPS/4) + CORE OVERHEAD
131 240 + 30 + 120 + 1000ns ~ 1400ns , or ~ 2 bytes
133 Total Host delay will be 18 bytes.
135 3.The Low-Speed clock is eight times slower than full speed i.e. 1/8th of
138 4.For non-periodic transfers, reserve for at least one low-speed device
139 transaction per frame. According to the USB Bandwidth Analysis white
140 paper and also as per OHCI Specification 1.0a, section 7.3.5, page 123,
141 one low-speed transaction takes 0x628h full speed bits (197 bytes),
142 which comes to around 13% of USB frame time.
144 5. Maximum Periodic bandwidth is calculated using the following formula
146 Maximum Periodic bandwidth = Maximum bandwidth available
147 - SOF - EOF - Maximum Non Periodic bandwidth.
149 o Bus Transaction Formulas
151 (Refer 5.11.3 section of USB2.0 specification)
155 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + Host_Delay
159 Protocol overhead + Hub LS overhead +
160 (Low-Speed clock * ((MaxPacketSize * 7) / 6 )) + Host_Delay
164 The figure 5.5 in OHCI specification 1.0a gives you information on periodic
165 scheduling, different polling intervals that are supported, & other details
166 for the OHCI host controller.
168 - The host controller processes one interrupt endpoint descriptor list every
169 frame. The lower five bits of the current frame number us used as an
170 index into an array of 32 interrupt endpoint descriptor lists or periodic
171 frame lists found in the HCCA (Host controller communication area). This
172 means each list is revisited once every 32ms. The host controller driver
173 sets up the interrupt lists to visit any given endpoint descriptor in as
174 many lists as necessary to provide the interrupt granularity required for
175 that endpoint. See figure 5.5 in OHCI specification 1.0a.
177 - Isochronous endpoint descriptors are added at the end of 1ms interrupt
178 endpoint descriptors.
180 - The host controller driver maintains an array of 32 frame bandwidth lists
181 to save bandwidth allocated in each USB frame.
183 Please refer section 5.2.7.2 of OHCI specification 1.0a, page 61 for more
186 o Bandwidth Allocation Scheme
188 The OHCI host controller driver will go through the following steps to
189 allocate bandwidth needed for an interrupt or isochronous endpoint as
192 - Calculate the bandwidth required for the given endpoint using the bus
193 transaction formula and protocol overhead calculations mentioned in
196 - Compare the bandwidth available in the least allocated frame list out of
197 the 32 frame bandwidth lists, against the bandwidth required by this
198 endpoint. If this exceeds the limit, then, an return error.
200 - Find out the static node to which the given endpoint needs to be linked
201 so that it will be polled as per the required polling interval. This value
202 varies based on polling interval and current bandwidth load on this
203 schedule. See figure 5.5 in OHCI specification 1.0a.
205 Ex: If a polling interval is 4ms, then, the endpoint will be linked to one
206 of the four static nodes (range 3-6) in the 4ms column of figure 5.5
207 in OHCI specification 1.0a.
209 - Depending on the polling interval, we need to add the above calculated
210 bandwidth to one or more frame bandwidth lists. Before adding, we need to
211 double check the availability of bandwidth in those respective lists. If
212 this exceeds the limit, then, return an error. Add this bandwidth to all
213 the required frame bandwidth lists.
215 Ex: Assume a give polling interval of 4 and a static node value of 3.
216 In this case, we need to add required bandwidth to 0,4,8,12,16,20,24,
217 28 frame bandwidth lists.
222 o Timing and data rate calculations
224 - Timing calculations
228 1 ms 1 frame or 8 uframes
230 - Data rate calculations
232 125 us 7500 bytes (per uframe)
233 16.66 ns 1 byte or 8 bits
235 1 high speed bit time 2.083 ns
237 o Protocol Overheads and Bandwidth numbers
241 (Refer 5.11.3, 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification)
243 Non Isochronous 917 ns 55 bytes
244 Isochronous 634 ns 38 bytes
246 Start split overhead 67 ns 4 bytes
247 Complete split overhead 67 ns 4 bytes
252 Host Delay* Specific to hardware 18 bytes
256 (Refer 5.5.4 section of USB2.0 specification)
258 Maximum bandwidth available 7500 bytes/uframe
259 Maximum Non Periodic bandwidth* 1500 bytes/uframe
260 Maximum Periodic bandwidth* 5918 bytes/uframe
264 1.Host delay will be specific to particular hardware.
266 2.As per USB 2.0 specification section 5.5.4, 20% of bus time is reserved
267 for the non-periodic high-speed transfers, where as periodic high-speed
268 transfers will get 80% of the bus time. In one micro-frame or 125us, we
269 can transfer 7500 bytes or 60,000 bits. So 20% of 7500 is 1500 bytes.
271 3.Maximum Periodic bandwidth is calculated using the following formula
273 Maximum Periodic bandwidth = Maximum bandwidth available
274 - SOF - EOF - Maximum Non Periodic bandwidth.
276 o Bus Transaction Formulas
278 (Refer 5.11.3 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification)
280 - High-Speed (Non-Split transactions):
282 (Protocol overhead + ((MaxPacketSize * 7) / 6 ) +
283 Host_Delay) x Number of transactions per micro-frame
285 - High-Speed (Split transaction - Device to Host):
287 Start Split transaction:
289 Protocol overhead + Host_Delay + Start split overhead
291 Complete Split transaction:
293 Protocol overhead + ((MaxPacketSize * 7) / 6 ) +
294 Host_Delay + Complete split overhead
296 - High-Speed (Split transaction - Host to Device):
298 Start Split transaction:
300 Protocol overhead + ((MaxPacketSize * 7) / 6 ) +
301 Host_Delay) + Start split overhead
303 Complete Split transaction:
305 Protocol overhead + Host_Delay + Complete split overhead
308 o Interrupt schedule or Start and Complete split masks
310 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification)
312 - Interrupt schedule or Start split mask
314 This field is used for for high, full and low speed usb device interrupt
315 and isochronous endpoints. This will tell the host controller which micro-
316 frame of a given usb frame to initiate a high speed interrupt and
317 isochronous transaction. For full/low speed devices, it will tell when to
318 initiate a "start split" transaction.
320 ehci_start_split_mask[15] = /* One byte field */
322 * For all low/full speed devices, and for high speed devices with
323 * a polling interval greater than or equal to 8us (125us).
325 {0x01, /* 00000001 */
334 /* For high speed devices with a polling interval of 4us. */
340 /* For high speed devices with a polling interval of 2us. */
344 /* For high speed devices with a polling interval of 1us. */
345 0xff }; /* 11111111 */
347 - Complete split mask
349 This field is used only for full/low speed usb device interrupt and
350 isochronous endpoints. It will tell the host controller which micro frame
351 to initiate a "complete split" transaction. Complete split transactions
352 can to be retried for up to 3 times. So bandwidth for complete split
353 transaction is reserved in 3 consecutive micro frames
355 ehci_complete_split_mask[8] = /* One byte field */
356 /* Only full/low speed devices */
357 {0x0e, /* 00001110 */
362 Reserved , /* Need FSTN feature */
363 Reserved , /* Need FSTN feature */
364 Reserved}; /* Need FSTN feature */
368 The figure 4.8 in EHCI specification gives you information on periodic
369 scheduling, different polling intervals that are supported, and other
370 details for the EHCI host controller.
372 - The high speed host controller can support 256, 512 or 1024 periodic frame
373 lists. By default all host controllers will support 1024 frame lists. In
374 our implementation, we support 1024 frame lists and we do this by first
375 constructing 32 periodic frame lists and duplicating the same periodic
376 frame lists for a total of 32 times. See figure 4.8 in EHCI specification.
378 - The host controller traverses the periodic schedule by constructing an
379 array offset reference from the PERIODICLISTBASE & the FRINDEX registers.
380 It fetches the element and begins traversing the graph of linked schedule
381 data structure. See fig 4.8 in EHCI specification.
383 - The host controller processes one interrupt endpoint descriptor list every
384 micro frame (125us). This means same list is revisited 8 times in a frame.
386 - The host controller driver sets up the interrupt lists to visit any given
387 endpoint descriptor in as many lists as necessary to provide the interrupt
388 granularity required for that endpoint.
390 - For isochronous transfers, we use only transfer descriptors but no
391 endpoint descriptors as in OHCI. Transfer descriptors are added at the
392 beginning of the periodic schedule.
394 - For EHCI, the bandwidth requirement is depends on the usb device speed
397 For a high speed usb device, you only need high speed bandwidth. For a
398 full/low speed device connected through a high speed hub, you need both
399 high speed bandwidth and TT (transaction translator) bandwidth.
401 High speed bandwidth information is saved in an EHCI data structure and TT
402 bandwidth is saved in the high speed hub's usb device data structure. Each
403 TT acts as a full speed host controller & its bandwidth allocation scheme
404 overhead calculations and other details are similar to those of a full
405 speed host controller. Refer to the "Full speed bus" section for more
408 - The EHCI host controller driver maintains an array of 32 frame lists to
409 store high speed bandwidth allocated in each frame and also each frame
410 list has eight micro frame lists, which saves bandwidth allocated in each
411 micro frame of that particular frame.
413 o Bandwidth Allocation Scheme
415 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification)
417 High speed Non Split Transaction (for High speed devices only):
419 For a given high speed interrupt or isochronous endpoint, the EHCI host
420 controller driver will go through the following steps to allocate
421 bandwidth needed for this endpoint.
423 - Calculate the bandwidth required for given endpoint using the formula and
424 overhead calculations mentioned in previous section.
426 - Compare the bandwidth available in the least allocated frame list out of
427 the 32 frame lists against the bandwidth required by this endpoint. If
428 this exceeds the limit, then, return an error.
430 - Map a given high speed endpoint's polling interval in micro seconds to an
431 interrupt list path based on a millisecond value. For example, an endpoint
432 with a polling interval of 16us will map to an interrupt list path of 2ms.
434 - Find out the static node to which the given endpoint needs to be linked
435 so that it will be polled at its required polling interval. This varies
436 based on polling interval and current bandwidth load on this schedule.
438 Ex: If a polling interval is 32us and its corresponding frame polling
439 interval will be 4ms, then the endpoint will be linked to one of the
440 four static nodes (range 3-6) in the 4ms column of figure 4.8 in EHCI
443 - Depending on the polling interval, we need to add the above calculated
444 bandwidth to one or more frame bandwidth lists, and also to one or more
445 micro frame bandwidth lists for that particular frame bandwidth list.
446 Before adding, we need to double check the availability of bandwidth in
447 those respective lists. If needed bandwidth is not available, then,
448 return an error. Otherwise add this bandwidth to all the required frame
449 and micro frame lists.
451 Ex: Assume given endpoint's polling interval is 32us and static node value
452 is 3. In this case, we need to add required bandwidth to 0,4,8,12,16,
453 20,24,28 frame bandwidth lists and micro bandwidth information is
454 saved using ehci_start_split_masks matrix. For this example, we need
455 to use any one of the 15 entries to save micro frame bandwidth.
457 High speed split transactions (for full and low speed devices only):
459 For a given full/low speed interrupt or isochronous endpoint, we need both
460 high speed and TT bandwidths. The TT bandwidth allocation is same as full
461 speed bus bandwidth allocation. Please refer to the "full speed bus"
462 bandwidth allocation section for more details.
464 The EHCI driver will go through the following steps to allocate high speed
465 bandwidth needed for this full/low speed endpoint.
467 - Calculate the bandwidth required for a given endpoint using the formula
468 and overhead calculations mentioned in previous section. In this case,
469 we need to calculate bandwidth needed both for Start and Complete start
470 transactions separately.
472 - Compare the bandwidth available in the least allocated frame list out of
473 32 frame lists against the bandwidth required by this endpoint. If this
474 exceeds the limit, then, return an error.
476 - Find out the static node to which the given endpoint needs to be linked
477 so that it will be polled as per the required polling interval. This
478 value varies based on polling interval and current bandwidth load on
481 Ex: If a polling interval is 4ms, then the endpoint will be linked to
482 one of the four static nodes (range 3-6) in the 4ms column of figure
483 4.8 in EHCI specification.
485 - Depending on the polling interval, we need to add the above calculated
486 Start and Complete split transactions bandwidth to one or more frame
487 bandwidth lists and also to one or more micro frame bandwidth lists for
488 that particular frame bandwidth list. In this case, the Start split
489 transaction needs bandwidth in one micro frame, where as the Complete
490 split transaction needs bandwidth in next three subsequent micro frames
491 of that particular frame or next frame. Before adding, we need to double
492 check the availability of bandwidth in those respective lists. If needed
493 bandwidth is not available, then, return an error. Otherwise add this
494 bandwidth to all the required lists.
496 Ex: Assume give polling interval is 4ms and static node value is 3. In
497 this case, we need to add required Start and Complete split
498 bandwidth to the 0,4,8,12,16,20,24,28 frame bandwidth lists. The
499 micro frame bandwidth lists is stored using ehci_start_split_mask &
500 ehci_complete_split_mask matrices. In this case, we need to use any
501 of the first 8 entries to save micro frame bandwidth.
503 Assume we found that the following micro frame bandwidth lists of
504 0,4,8,12,16,20,24,28 frame lists can be used for this endpoint.
505 It means, we need to initiate "start split transaction" in first
506 micro frame of 0,4,8,12,16,20,24,28 frames.
508 Start split mask = 0x01, /* 00000001 */
510 For this "start split mask", the "complete split mask" should be
512 Complete split mask = 0x0e, /* 00001110 */
514 It means try "complete split transactions" in second, third or
515 fourth micro frames of 0,4,8,12,16,20,24,28 frames.
519 - USB2.0, OHCI and EHCI Specifications
521 http://www.usb.org/developers/docs
523 - USB bandwidth analysis from Intel
525 http://www.usb.org/developers/whitepapers