1 User Interface for Resource Allocation in Intel Resource Director Technology
3 Copyright (C) 2016 Intel Corporation
5 Fenghua Yu <fenghua.yu@intel.com>
6 Tony Luck <tony.luck@intel.com>
8 This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
9 X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
11 To use the feature mount the file system:
13 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
17 "cdp": Enable code/data prioritization in L3 cache allocations.
23 The 'info' directory contains information about the enabled
24 resources. Each resource has its own subdirectory. The subdirectory
25 names reflect the resource names. Each subdirectory contains the
28 "num_closids": The number of CLOSIDs which are valid for this
29 resource. The kernel uses the smallest number of
30 CLOSIDs of all enabled resources as limit.
32 "cbm_mask": The bitmask which is valid for this resource. This
33 mask is equivalent to 100%.
35 "min_cbm_bits": The minimum number of consecutive bits which must be
36 set when writing a mask.
41 Resource groups are represented as directories in the resctrl file
42 system. The default group is the root directory. Other groups may be
43 created as desired by the system administrator using the "mkdir(1)"
44 command, and removed using "rmdir(1)".
46 There are three files associated with each group:
48 "tasks": A list of tasks that belongs to this group. Tasks can be
49 added to a group by writing the task ID to the "tasks" file
50 (which will automatically remove them from the previous
51 group to which they belonged). New tasks created by fork(2)
52 and clone(2) are added to the same group as their parent.
53 If a pid is not in any sub partition, it is in root partition
54 (i.e. default partition).
56 "cpus": A bitmask of logical CPUs assigned to this group. Writing
57 a new mask can add/remove CPUs from this group. Added CPUs
58 are removed from their previous group. Removed ones are
59 given to the default (root) group. You cannot remove CPUs
60 from the default group.
62 "schemata": A list of all the resources available to this group.
63 Each resource has its own line and format - see below for
66 When a task is running the following rules define which resources
69 1) If the task is a member of a non-default group, then the schemata
70 for that group is used.
72 2) Else if the task belongs to the default group, but is running on a
73 CPU that is assigned to some specific group, then the schemata for
74 the CPU's group is used.
76 3) Otherwise the schemata for the default group is used.
79 Schemata files - general concepts
80 ---------------------------------
81 Each line in the file describes one resource. The line starts with
82 the name of the resource, followed by specific values to be applied
83 in each of the instances of that resource on the system.
87 On current generation systems there is one L3 cache per socket and L2
88 caches are generally just shared by the hyperthreads on a core, but this
89 isn't an architectural requirement. We could have multiple separate L3
90 caches on a socket, multiple cores could share an L2 cache. So instead
91 of using "socket" or "core" to define the set of logical cpus sharing
92 a resource we use a "Cache ID". At a given cache level this will be a
93 unique number across the whole system (but it isn't guaranteed to be a
94 contiguous sequence, there may be gaps). To find the ID for each logical
95 CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
99 For cache resources we describe the portion of the cache that is available
100 for allocation using a bitmask. The maximum value of the mask is defined
101 by each cpu model (and may be different for different cache levels). It
102 is found using CPUID, but is also provided in the "info" directory of
103 the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
104 requires that these masks have all the '1' bits in a contiguous block. So
105 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
106 and 0xA are not. On a system with a 20-bit mask each bit represents 5%
107 of the capacity of the cache. You could partition the cache into four
108 equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
111 L3 details (code and data prioritization disabled)
112 --------------------------------------------------
113 With CDP disabled the L3 schemata format is:
115 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
117 L3 details (CDP enabled via mount option to resctrl)
118 ----------------------------------------------------
119 When CDP is enabled L3 control is split into two separate resources
120 so you can specify independent masks for code and data like this:
122 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
123 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
127 L2 cache does not support code and data prioritization, so the
128 schemata format is always:
130 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
134 On a two socket machine (one L3 cache per socket) with just four bits
137 # mount -t resctrl resctrl /sys/fs/resctrl
140 # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
141 # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
143 The default resource group is unmodified, so we have access to all parts
144 of all caches (its schemata file reads "L3:0=f;1=f").
146 Tasks that are under the control of group "p0" may only allocate from the
147 "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
148 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
152 Again two sockets, but this time with a more realistic 20-bit mask.
154 Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
155 processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
156 neighbors, each of the two real-time tasks exclusively occupies one quarter
157 of L3 cache on socket 0.
159 # mount -t resctrl resctrl /sys/fs/resctrl
162 First we reset the schemata for the default group so that the "upper"
163 50% of the L3 cache on socket 0 cannot be used by ordinary tasks:
165 # echo "L3:0=3ff;1=fffff" > schemata
167 Next we make a resource group for our first real time task and give
168 it access to the "top" 25% of the cache on socket 0.
171 # echo "L3:0=f8000;1=fffff" > p0/schemata
173 Finally we move our first real time task into this resource group. We
174 also use taskset(1) to ensure the task always runs on a dedicated CPU
175 on socket 0. Most uses of resource groups will also constrain which
176 processors tasks run on.
178 # echo 1234 > p0/tasks
181 Ditto for the second real time task (with the remaining 25% of cache):
184 # echo "L3:0=7c00;1=fffff" > p1/schemata
185 # echo 5678 > p1/tasks
191 A single socket system which has real-time tasks running on core 4-7 and
192 non real-time workload assigned to core 0-3. The real-time tasks share text
193 and data, so a per task association is not required and due to interaction
194 with the kernel it's desired that the kernel on these cores shares L3 with
197 # mount -t resctrl resctrl /sys/fs/resctrl
200 First we reset the schemata for the default group so that the "upper"
201 50% of the L3 cache on socket 0 cannot be used by ordinary tasks:
203 # echo "L3:0=3ff" > schemata
205 Next we make a resource group for our real time cores and give
206 it access to the "top" 50% of the cache on socket 0.
209 # echo "L3:0=ffc00;" > p0/schemata
211 Finally we move core 4-7 over to the new group and make sure that the
212 kernel and the tasks running there get 50% of the cache.
216 4) Locking between applications
218 Certain operations on the resctrl filesystem, composed of read/writes
219 to/from multiple files, must be atomic.
221 As an example, the allocation of an exclusive reservation of L3 cache
224 1. Read the cbmmasks from each directory
225 2. Find a contiguous set of bits in the global CBM bitmask that is clear
226 in any of the directory cbmmasks
227 3. Create a new directory
228 4. Set the bits found in step 2 to the new directory "schemata" file
230 If two applications attempt to allocate space concurrently then they can
231 end up allocating the same bits so the reservations are shared instead of
234 To coordinate atomic operations on the resctrlfs and to avoid the problem
235 above, the following locking procedure is recommended:
237 Locking is based on flock, which is available in libc and also as a shell
242 A) Take flock(LOCK_EX) on /sys/fs/resctrl
243 B) Read/write the directory structure.
248 A) Take flock(LOCK_SH) on /sys/fs/resctrl
249 B) If success read the directory structure.
254 # Atomically read directory structure
255 $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
257 # Read directory contents and create new subdirectory
260 find /sys/fs/resctrl/ > output.txt
261 mask = function-of(output.txt)
262 mkdir /sys/fs/resctrl/newres/
263 echo mask > /sys/fs/resctrl/newres/schemata
265 $ flock /sys/fs/resctrl/ ./create-dir.sh
270 * Example code do take advisory locks
271 * before accessing resctrl filesystem
273 #include <sys/file.h>
276 void resctrl_take_shared_lock(int fd)
280 /* take shared lock on resctrl filesystem */
281 ret = flock(fd, LOCK_SH);
288 void resctrl_take_exclusive_lock(int fd)
292 /* release lock on resctrl filesystem */
293 ret = flock(fd, LOCK_EX);
300 void resctrl_release_lock(int fd)
304 /* take shared lock on resctrl filesystem */
305 ret = flock(fd, LOCK_UN);
316 fd = open("/sys/fs/resctrl", O_DIRECTORY);
321 resctrl_take_shared_lock(fd);
322 /* code to read directory contents */
323 resctrl_release_lock(fd);
325 resctrl_take_exclusive_lock(fd);
326 /* code to read and write directory contents */
327 resctrl_release_lock(fd);