1 SCHED_EXT EXAMPLE SCHEDULERS
2 ============================
6 This directory contains a number of example sched_ext schedulers. These
7 schedulers are meant to provide examples of different types of schedulers
8 that can be built using sched_ext, and illustrate how various features of
11 Some of the examples are performant, production-ready schedulers. That is, for
12 the correct workload and with the correct tuning, they may be deployed in a
13 production environment with acceptable or possibly even improved performance.
14 Others are just examples that in practice, would not provide acceptable
15 performance (though they could be improved to get there).
17 This README will describe these example schedulers, including describing the
18 types of workloads or scenarios they're designed to accommodate, and whether or
19 not they're production ready. For more details on any of these schedulers,
20 please see the header comment in their .bpf.c file.
23 # Compiling the examples
25 There are a few toolchain dependencies for compiling the example schedulers.
27 ## Toolchain dependencies
31 The schedulers are BPF programs, and therefore must be compiled with clang. gcc
32 is actively working on adding a BPF backend compiler as well, but are still
33 missing some features such as BTF type tags which are necessary for using
38 You may need pahole in order to generate BTF from DWARF.
42 Rust schedulers uses features present in the rust toolchain >= 1.70.0. You
43 should be able to use the stable build from rustup, but if that doesn't
44 work, try using the rustup nightly build.
46 There are other requirements as well, such as make, but these are the main /
49 ## Compiling the kernel
51 In order to run a sched_ext scheduler, you'll have to run a kernel compiled
52 with the patches in this repository, and with a minimum set of necessary
57 CONFIG_SCHED_CLASS_EXT=y
60 CONFIG_DEBUG_INFO_BTF=y
63 It's also recommended that you also include the following Kconfig options:
66 CONFIG_BPF_JIT_ALWAYS_ON=y
67 CONFIG_BPF_JIT_DEFAULT_ON=y
68 CONFIG_PAHOLE_HAS_SPLIT_BTF=y
69 CONFIG_PAHOLE_HAS_BTF_TAG=y
72 There is a `Kconfig` file in this directory whose contents you can append to
73 your local `.config` file, as long as there are no conflicts with any existing
76 ## Getting a vmlinux.h file
78 You may notice that most of the example schedulers include a "vmlinux.h" file.
79 This is a large, auto-generated header file that contains all of the types
80 defined in some vmlinux binary that was compiled with
81 [BTF](https://docs.kernel.org/bpf/btf.html) (i.e. with the BTF-related Kconfig
82 options specified above).
84 The header file is created using `bpftool`, by passing it a vmlinux binary
85 compiled with BTF as follows:
88 $ bpftool btf dump file /path/to/vmlinux format c > vmlinux.h
91 `bpftool` analyzes all of the BTF encodings in the binary, and produces a
92 header file that can be included by BPF programs to access those types. For
93 example, using vmlinux.h allows a scheduler to access fields defined directly
94 in vmlinux as follows:
98 // vmlinux.h is also implicitly included by scx_common.bpf.h.
99 #include "scx_common.bpf.h"
102 * vmlinux.h provides definitions for struct task_struct and
103 * struct scx_enable_args.
105 void BPF_STRUCT_OPS(example_enable, struct task_struct *p,
106 struct scx_enable_args *args)
108 bpf_printk("Task %s enabled in example scheduler", p->comm);
111 // vmlinux.h provides the definition for struct sched_ext_ops.
112 SEC(".struct_ops.link")
113 struct sched_ext_ops example_ops {
114 .enable = (void *)example_enable,
119 The scheduler build system will generate this vmlinux.h file as part of the
120 scheduler build pipeline. It looks for a vmlinux file in the following
123 1. If the O= environment variable is defined, at `$O/vmlinux`
124 2. If the KBUILD_OUTPUT= environment variable is defined, at
125 `$KBUILD_OUTPUT/vmlinux`
126 3. At `../../vmlinux` (i.e. at the root of the kernel tree where you're
127 compiling the schedulers)
128 3. `/sys/kernel/btf/vmlinux`
129 4. `/boot/vmlinux-$(uname -r)`
131 In other words, if you have compiled a kernel in your local repo, its vmlinux
132 file will be used to generate vmlinux.h. Otherwise, it will be the vmlinux of
133 the kernel you're currently running on. This means that if you're running on a
134 kernel with sched_ext support, you may not need to compile a local kernel at
139 One of the cooler features of BPF is that it supports
140 [CO-RE](https://nakryiko.com/posts/bpf-core-reference-guide/) (Compile Once Run
141 Everywhere). This feature allows you to reference fields inside of structs with
142 types defined internal to the kernel, and not have to recompile if you load the
143 BPF program on a different kernel with the field at a different offset. In our
144 example above, we print out a task name with `p->comm`. CO-RE would perform
145 relocations for that access when the program is loaded to ensure that it's
146 referencing the correct offset for the currently running kernel.
148 ## Compiling the schedulers
150 Once you have your toolchain setup, and a vmlinux that can be used to generate
151 a full vmlinux.h file, you can compile the schedulers using `make`:
159 This directory contains the following example schedulers. These schedulers are
160 for testing and demonstrating different aspects of sched_ext. While some may be
161 useful in limited scenarios, they are not intended to be practical.
163 For more scheduler implementations, tools and documentation, visit
164 https://github.com/sched-ext/scx.
168 A simple scheduler that provides an example of a minimal sched_ext scheduler.
169 scx_simple can be run in either global weighted vtime mode, or FIFO mode.
171 Though very simple, in limited scenarios, this scheduler can perform reasonably
172 well on single-socket systems with a unified L3 cache.
176 Another simple, yet slightly more complex scheduler that provides an example of
177 a basic weighted FIFO queuing policy. It also provides examples of some common
178 useful BPF features, such as sleepable per-task storage allocation in the
179 `ops.prep_enable()` callback, and using the `BPF_MAP_TYPE_QUEUE` map type to
180 enqueue tasks. It also illustrates how core-sched support could be implemented.
184 A "central" scheduler where scheduling decisions are made from a single CPU.
185 This scheduler illustrates how scheduling decisions can be dispatched from a
186 single CPU, allowing other cores to run with infinite slices, without timer
187 ticks, and without having to incur the overhead of making scheduling decisions.
189 The approach demonstrated by this scheduler may be useful for any workload that
190 benefits from minimizing scheduling overhead and timer ticks. An example of
191 where this could be particularly useful is running VMs, where running with
192 infinite slices and no timer ticks allows the VM to avoid unnecessary expensive
197 A flattened cgroup hierarchy scheduler. This scheduler implements hierarchical
198 weight-based cgroup CPU control by flattening the cgroup hierarchy into a single
199 layer, by compounding the active weight share at each level. The effect of this
200 is a much more performant CPU controller, which does not need to descend down
201 cgroup trees in order to properly compute a cgroup's share.
203 Similar to scx_simple, in limited scenarios, this scheduler can perform
204 reasonably well on single socket-socket systems with a unified L3 cache and show
205 significantly lowered hierarchical scheduling overhead.
210 There are a number of common issues that you may run into when building the
211 schedulers. We'll go over some of the common ones here.
215 ### Old version of clang
218 error: static assertion failed due to requirement 'SCX_DSQ_FLAG_BUILTIN': bpftool generated vmlinux.h is missing high bits for 64bit enums, upgrade clang and pahole
219 _Static_assert(SCX_DSQ_FLAG_BUILTIN,
224 This means you built the kernel or the schedulers with an older version of
225 clang than what's supported (i.e. older than 16.0.0). To remediate this:
227 1. `which clang` to make sure you're using a sufficiently new version of clang.
229 2. `make fullclean` in the root path of the repository, and rebuild the kernel
232 3. Rebuild the kernel, and then your example schedulers.
234 The schedulers are also cleaned if you invoke `make mrproper` in the root
235 directory of the tree.
237 ### Stale kernel build / incomplete vmlinux.h file
239 As described above, you'll need a `vmlinux.h` file that was generated from a
240 vmlinux built with BTF, and with sched_ext support enabled. If you don't,
241 you'll see errors such as the following which indicate that a type being
242 referenced in a scheduler is unknown:
245 /path/to/sched_ext/tools/sched_ext/user_exit_info.h:25:23: note: forward declaration of 'struct scx_exit_info'
247 const struct scx_exit_info *ei)
252 In order to resolve this, please follow the steps above in
253 [Getting a vmlinux.h file](#getting-a-vmlinuxh-file) in order to ensure your
254 schedulers are using a vmlinux.h file that includes the requisite types.
260 You may see the following output when building the schedulers:
263 Auto-detecting system features:
264 ... clang-bpf-co-re: [ on ]
270 Seeing `llvm: [ OFF ]` here is not an issue. You can safely ignore.