docs/gsl-intro.md

   1
   2 # Using the Guidelines Support Library (GSL): A Tutorial and FAQ
   3
   4 by Herb Sutter
   5
   6 updated 2018-01-08
   7
   8
   9 ## Overview: "Is this document a tutorial or a FAQ?"
  10
  11 It aims to be both:
  12
  13 - a tutorial you can read in order, following a similar style as the introduction of [K&R](https://en.wikipedia.org/wiki/The_C_Programming_Language) by building up examples of increasing complexity; and
  14
  15 - a FAQ you can use as a reference, with each section showing the answer to a specific question.
  16
  17
  18 ## Motivation: "Why would I use GSL, and where can I get it?"
  19
  20 First look at the [C++ Core Guidelines](https://github.com/isocpp/CppCoreGuidelines); this is a support library for that document. Select a set of guidelines you want to adopt, then bring in the GSL as directed by those guidelines.
  21
  22 You can try out the examples in this document on all major compilers and platforms using [this GSL reference implementation](https://github.com/microsoft/gsl).
  23
  24
  25 # gsl::span: "What is gsl::span, and what is it for?"
  26
  27 `gsl::span` is a replacement for `(pointer, length)` pairs to refer to a sequence of contiguous objects. It can be thought of as a pointer to an array, but that knows its bounds.
  28
  29 For example, a `span<int,7>` refers to a sequence of seven contiguous integers.
  30
  31 A `span` does not own the elements it points to. It is not a container like an `array` or a `vector`, it is a view into the contents of such a container.
  32
  33
  34 ## span parameters: "How should I choose between span and traditional (ptr, length) parameters?"
  35
  36 In new code, prefer the bounds-checkable `span<T>` instead of separate pointer and length parameters. In older code, adopt `span` where reasonable as you maintain the code.
  37
  38 A function that takes a pointer to an array and a separate length, such as:
  39
  40 ~~~cpp
  41 // Error-prone: Process n contiguous ints starting at *p
  42 void dangerous_process_ints(const int* p, size_t n);
  43 ~~~
  44
  45 is error-prone and difficult to use correctly:
  46
  47 ~~~cpp
  48 int a[100];
  49 dangerous_process_ints(a, 1000); // oops: buffer overflow
  50
  51 vector<int> v(200);
  52 dangerous_process_ints(v.data(), 1000); // oops: buffer overflow
  53
  54 auto remainder = find(v.begin(), v.end(), some_value);
  55     // now call dangerous_process_ints() to fill the rest of the container from *remainder to the end
  56 dangerous_process_ints(&*remainder, v.end() - remainder); // correct but convoluted
  57 ~~~
  58
  59 Instead, using `span` encapsulates the pointer and the length:
  60
  61 ~~~cpp
  62 // BETTER: Read s.size() contiguous ints starting at s[0]
  63 void process_ints(span<const int> s);
  64 ~~~
  65
  66 which makes `process_ints` easier to use correctly because it conveniently deduces from common types:
  67
  68 ~~~cpp
  69 int a[100];
  70 process_ints(a); // deduces correct length: 100 (constructs the span from a container)
  71
  72 vector<int> v(200);
  73 process_ints(v); // deduces correct length: 200 (constructs the span from a container)
  74 ~~~
  75
  76 and conveniently supports modern C++ argument initialization when the calling code does have distinct pointer and length arguments:
  77
  78 ~~~cpp
  79 auto remainder = find(v.begin(), v.end(), some_value);
  80     // now call process_ints() to fill the rest of the container from *remainder to the end
  81 process_ints({remainder, v.end()}); // correct and clear (constructs the span from an iterator pair)
  82 ~~~
  83
  84 > Things to remember
  85 > - Prefer `span` instead of (pointer, length) pairs.
  86 > - Pass a `span` like a pointer (i.e., by value for "in" parameters). Treat it like a pointer range.
  87
  88
  89 ## span and const: "What's the difference between `span<const T>` and `const span<T>`?"
  90
  91 `span<const T>` means that the `T` objects are read-only. Prefer this by default, especially as a parameter, if you don't need to modify the `T`s.
  92
  93 `const span<T>` means that the `span` itself can't be made to point at a different target.
  94
  95 `const span<const T>` means both.
  96
  97 > Things to remember
  98 > - Prefer a `span<const T>` by default to denote that the contents are read-only, unless you do need read-write access.
  99
 100
 101 ## Iteration: "How do I iterate over a span?"
 102
 103 A `span` is an encapsulated range, and so can be visited using a range-based `for` loop.
 104
 105 Consider the implementation of a function like the `process_ints` that we saw in an earlier example. Visiting every object using a (pointer, length) pair requires an explicit index:
 106
 107 ~~~cpp
 108 void dangerous_process_ints(int* p, size_t n) {
 109     for (auto i = 0; i < n; ++i) {
 110         p[i] = next_character();
 111     }
 112 }
 113 ~~~
 114
 115 A `span` supports range-`for` -- note this is zero-overhead and does not need to perform any range check, because the range-`for` loop is known by construction not to exceed the range's bounds:
 116
 117 ~~~cpp
 118 void process_ints(span<int> s) {
 119     for (auto& c : s) {
 120         c = next_character();
 121     }
 122 }
 123 ~~~
 124
 125 A `span` also supports normal iteration using `.begin()` and `.end()`.
 126
 127 Note that you cannot compare iterators from different spans, even if they refer to the same array.
 128
 129 An iterator is valid as long as the `span` that it is iterating over exists.
 130
 131
 132 ## Element access: "How do I access a single element in a span?"
 133
 134 Use `myspan[offset]` to subscript, or equivalently use `iter + offset` wheren `iter` is a `span<T>::iterator`. Both are range-checked.
 135
 136
 137
 138 ## Sub-spans: "What if I need a subrange of a span?"
 139
 140 To refer to a sub-span, use `first`, `last`, or `subspan`.
 141
 142 ~~~cpp
 143 void process_ints(span<widget> s) {
 144     if (s.length() > 10) {
 145         read_header(s.first(10));   // first 10 entries
 146         read_rest(s.subspan(10));   // remaining entries
 147         // ...
 148     }
 149 }
 150 ~~~
 151
 152 In rarer cases, when you know the number of elements at compile time and want to enable `constexpr` use of `span`, you can pass the length of the sub-span as a template argument:
 153
 154 ~~~cpp
 155 constexpr int process_ints(span<widget> s) {
 156     if (s.length() > 10) {
 157         read_header(s.first<10>());   // first 10 entries
 158         read_rest(s.subspan<10>());   // remaining entries
 159         // ...
 160     }
 161     return s.size();
 162 }
 163 ~~~
 164
 165
 166 ## span and STL: "How do I pass a span to an STL-style [begin,end) function?"
 167
 168 Use `span::iterator`s. A `span` is iterable like any STL range.
 169
 170 To call an STL `[begin,end)`-style interface, use `begin` and `end` by default, or other valid iterators if you don't want to pass the whole range:
 171
 172 ~~~cpp
 173 void f(span<widget> s) {
 174     // ...
 175     auto found = find_if(s.begin(), s.end(), some_value);
 176     // ...
 177 }
 178 ~~~
 179
 180 If you are using a range-based algorithm such as from [Range-V3](https://github.com/ericniebler/range-v3), you can use a `span` as a range directly:
 181
 182 ~~~cpp
 183 void f(span<widget> s) {
 184     // ...
 185     auto found = find_if(s, some_value);
 186     // ...
 187 }
 188 ~~~
 189
 190
 191 ## Comparison: "When I compare `span<T>`s, do I compare the `T` values or the underlying pointers?"
 192
 193 Comparing two `span<T>`s compares the `T` values. To compare two spans for identity, to see if they're pointing to the same thing, use `.data()`.
 194
 195 ~~~cpp
 196 int a[] = { 1, 2, 3};
 197 span<int> sa{a};
 198
 199 vector<int> v = { 1, 2, 3 };
 200 span<int> sv{v};
 201
 202 assert(sa == sv); // sa and sv both point to contiguous ints with values 1, 2, 3
 203 assert(sa.data() != sv.data()); // but sa and sv point to different memory areas
 204 ~~~
 205
 206 > Things to remember
 207 > - Comparing spans compares their contents, not whether they point to the same location.
 208
 209
 210 ## Empty vs null: "Do I have to explicitly check whether a span is null?"
 211
 212 Usually not, because the thing you usually want to check for is that the `span` is not empty, which means its size is not zero. It's safe to test the size of a span even if it's null.
 213
 214 Remember that the following all have identical meaning for a `span s`:
 215
 216 - `!s.empty()`
 217 - `s.size() != 0`
 218 - `s.data() != nullptr && s.size() != 0` (the first condition is actually redundant)
 219
 220 The following is also functionally equivalent as it just tests whether there are zero elements:
 221
 222 - `s != nullptr` (compares `s` against a null-constructed empty `span`)
 223
 224 For example:
 225
 226 ~~~cpp
 227 void f(span<const int> s) {
 228     if (s != nullptr && s.size() > 0) { // bad: redundant, overkill
 229         // ...
 230     }
 231
 232     if (s.size() > 0) { // good: not redundant
 233         // ...
 234     }
 235
 236     if (!s.empty()) { // good: same as "s.size() > 0"
 237         // ...
 238     }
 239 }
 240
 241 ~~~
 242
 243 > Things to remember
 244 > - Usually you shouldn't check for a null `span`. For a `span s`, if you're comparing `s != nullptr` or `s.data() != nullptr`, check to make sure you shouldn't just be asking `!s.empty()`.
 245
 246
 247 ## as_bytes: "Why would I convert a span to `span<const byte>`?"
 248
 249 Because it's a type-safe way to get a read-only view of the objects' bytes.
 250
 251 Without `span`, to view the bytes of an object requires writing a brittle cast:
 252
 253 ~~~cpp
 254 void serialize(char* p, int length); // bad: forgot const
 255
 256 void f(widget* p, int length) {
 257     // serialize one object's bytes (incl. padding)
 258     serialize(p, 1); // bad: copies just the first byte, forgot sizeof(widget)
 259 }
 260 ~~~
 261
 262 With `span` the code is safer and cleaner:
 263
 264 ~~~cpp
 265 void serialize(span<const byte>); // can't forget const, the first test call site won't compile
 266
 267 void f(span<widget> s) {
 268     // ...
 269     // serialize one object's bytes (incl. padding)
 270     serialize(as_bytes(s)); // ok
 271 }
 272 ~~~
 273
 274 Also, `span<T>` lets you distinguish between `.size()` and `.size_bytes()`; make use of that distinction instead of multiplying by `sizeof(T)`.
 275
 276 > Things to remember
 277 > - Prefer `span<T>`'s `.size_bytes()` instead of `.size() * sizeof(T)`.
 278
 279
 280 ## And a few `span`-related hints
 281
 282 These are not directly related to `span` but can often come up while using `span`.
 283
 284    * Use `byte` everywhere you are handling memory (as opposed to characters or integers). That is, when accessing a chunk of raw memory, use `gsl::span<std::byte>`.
 285
 286    * Use `narrow()` when you cannot afford to be surprised by a value change during conversion to a smaller range. This includes going between a signed `span` size or index and an unsigned today's-STL-container `.size()`, though the `span` constructors from containers nicely encapsulate many of these conversions.
 287
 288    * Similarly, use `narrow_cast()` when you are *sure* you won’t be surprised by a value change during conversion to a smaller range