libcxx/docs/DesignDocs/FileTimeType.rst

   1 ==============
   2 File Time Type
   3 ==============
   4
   5 .. contents::
   6    :local:
   7
   8 .. _file-time-type-motivation:
   9
  10 Motivation
  11 ==========
  12
  13 The filesystem library provides interfaces for getting and setting the last
  14 write time of a file or directory. The interfaces use the ``file_time_type``
  15 type, which is a specialization of ``chrono::time_point`` for the
  16 "filesystem clock". According to [fs.filesystem.syn]
  17
  18   trivial-clock is an implementation-defined type that satisfies the
  19   Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
  20   representing and measuring file time values. Implementations should ensure
  21   that the resolution and range of file_time_type reflect the operating
  22   system dependent resolution and range of file time values.
  23
  24
  25 On POSIX systems, file times are represented using the ``timespec`` struct,
  26 which is defined as follows:
  27
  28 .. code-block:: cpp
  29
  30   struct timespec {
  31     time_t tv_sec;
  32     long   tv_nsec;
  33   };
  34
  35 To represent the range and resolution of ``timespec``, we need to (A) have
  36 nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
  37
  38 As the standard requires us to use the ``chrono`` interface, we have to define
  39 our own filesystem clock which specifies the period and representation of
  40 the time points and duration it provides. It will look like this:
  41
  42 .. code-block:: cpp
  43
  44   struct _FilesystemClock {
  45     using period = nano;
  46     using rep = TBD; // What is this?
  47
  48     using duration = chrono::duration<rep, period>;
  49     using time_point = chrono::time_point<_FilesystemClock>;
  50
  51     // ... //
  52   };
  53
  54   using file_time_type = _FilesystemClock::time_point;
  55
  56
  57 To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
  58 But what type can we use as the arithmetic representation that is capable
  59 of representing the range of the ``timespec`` struct?
  60
  61 Problems To Consider
  62 ====================
  63
  64 Before considering solutions, let's consider the problems they should solve,
  65 and how important solving those problems are:
  66
  67
  68 Having a Smaller Range than ``timespec``
  69 ----------------------------------------
  70
  71 One solution to the range problem is to simply reduce the resolution of
  72 ``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
  73 initial implementation of ``file_time_type`` did; it's also what
  74 ``std::system_clock`` does. As a result, it can represent time points about
  75 292 thousand years on either side of the epoch, as opposed to only 292 years
  76 at nanosecond resolution.
  77
  78 ``timespec`` can represent time points +/- 292 billion years from the epoch
  79 (just in case you needed a time point 200 billion years before the big bang,
  80 and with nanosecond resolution).
  81
  82 To get the same range, we would need to drop our resolution to that of seconds
  83 to come close to having the same range.
  84
  85 This begs the question, is the range problem "really a problem"? Sane usages
  86 of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
  87
  88 I believe the answer is yes. We're not designing the filesystem time API, we're
  89 providing glorified C++ wrappers for it. If the underlying API supports
  90 a value, then we should too. Our wrappers should not place artificial restrictions
  91 on users that are not present in the underlying filesystem.
  92
  93 Having a smaller range that the underlying filesystem forces the
  94 implementation to report ``value_too_large`` errors when it encounters a time
  95 point that it can't represent. This can cause the call to ``last_write_time``
  96 to throw in cases where the user was confident the call should succeed. (See below)
  97
  98
  99 .. code-block:: cpp
 100
 101   #include <filesystem>
 102   using namespace std::filesystem;
 103
 104   // Set the times using the system interface.
 105   void set_file_times(const char* path, struct timespec ts) {
 106     timespec both_times[2];
 107     both_times[0] = ts;
 108     both_times[1] = ts;
 109     int result = ::utimensat(AT_FDCWD, path, both_times, 0);
 110     assert(result != -1);
 111   }
 112
 113   // Called elsewhere to set the file time to something insane, and way
 114   // out of the 300 year range we might expect.
 115   void some_bad_persons_code() {
 116     struct timespec new_times;
 117     new_times.tv_sec = numeric_limits<time_t>::max();
 118     new_times.tv_nsec = 0;
 119     set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
 120   }
 121
 122   int main(int, char**) {
 123     path p = "/tmp/foo";
 124     file_status st = status(p);
 125     if (!exists(st) || !is_regular_file(st))
 126       return 1;
 127     if ((st.permissions() & perms::others_read) == perms::none)
 128       return 1;
 129     // It seems reasonable to assume this call should succeed.
 130     file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
 131     return 0;
 132   }
 133
 134
 135 Having a Smaller Resolution than ``timespec``
 136 ---------------------------------------------
 137
 138 As mentioned in the previous section, one way to solve the range problem
 139 is by reducing the resolution. But matching the range of ``timespec`` using a
 140 64 bit representation requires limiting the resolution to seconds.
 141
 142 So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
 143 I limit my consideration of the point to this: Why was it not good enough for
 144 the underlying system interfaces? If it wasn't good enough for them, then it
 145 isn't good enough for us. Our job is to match the filesystems range and
 146 representation, not design it.
 147
 148
 149 Having a Larger Range than ``timespec``
 150 ----------------------------------------
 151
 152 We should also consider the opposite problem of having a ``file_time_type``
 153 that is able to represent a larger range than ``timespec``. At least in
 154 this case ``last_write_time`` can be used to get and set all possible values
 155 supported by the underlying filesystem; meaning ``last_write_time(p)`` will
 156 never throw an overflow error when retrieving a value.
 157
 158 However, this introduces a new problem, where users are allowed to attempt to
 159 create a time point beyond what the filesystem can represent. Two particular
 160 values which cause this are ``file_time_type::min()`` and
 161 ``file_time_type::max()``. As a result, the following code would throw:
 162
 163 .. code-block:: cpp
 164
 165   void test() {
 166     last_write_time("/tmp/foo", file_time_type::max()); // Throws
 167     last_write_time("/tmp/foo", file_time_type::min()); // Throws.
 168   }
 169
 170 Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
 171 a valid time point, adding a couple hundred billions of years in error,
 172 and then trying to update a file's write time to that value very often.
 173
 174 Compared to having a smaller range, this problem seems preferable. At least
 175 now we can represent any time point the filesystem can, so users won't be forced
 176 to revert back to system interfaces to avoid limitations in the C++ STL.
 177
 178 I posit that we should only consider this concern *after* we have something
 179 with at least the same range and resolution of the underlying filesystem. The
 180 latter two problems are much more important to solve.
 181
 182 Potential Solutions And Their Complications
 183 ===========================================
 184
 185 Source Code Portability Across Implementations
 186 -----------------------------------------------
 187
 188 As we've discussed, ``file_time_type`` needs a representation that uses more
 189 than 64 bits. The possible solutions include using ``__int128_t``, emulating a
 190 128 bit integer using a class, or potentially defining a ``timespec`` like
 191 arithmetic type. All three will allow us to, at minimum, match the range
 192 and resolution, and the last one might even allow us to match them exactly.
 193
 194 But when considering these potential solutions we need to consider more than
 195 just the values they can represent. We need to consider the effects they will
 196 have on users and their code. For example, each of them breaks the following
 197 code in some way:
 198
 199 .. code-block:: cpp
 200
 201   // Bug caused by an unexpected 'rep' type returned by count.
 202   void print_time(path p) {
 203     // __int128_t doesn't have streaming operators, and neither would our
 204     // custom arithmetic types.
 205     cout << last_write_time(p).time_since_epoch().count() << endl;
 206   }
 207
 208   // Overflow during creation bug.
 209   file_time_type timespec_to_file_time_type(struct timespec ts) {
 210     // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
 211     // this may overflow before it's converted to a file_time_type.
 212     auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
 213     return file_time_type(dur);
 214   }
 215
 216   file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
 217     // This is the correct version of the above example, where we
 218     // avoid using the chrono typedefs as they're not sufficient.
 219     // Can we expect users to avoid this bug?
 220     using fs_seconds = chrono::duration<file_time_type::rep>;
 221     using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
 222     auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
 223     return file_time_type(dur);
 224   }
 225
 226   // Implicit truncation during conversion bug.
 227   intmax_t get_time_in_seconds(path p) {
 228     using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
 229     auto tp = last_write_time(p);
 230
 231     // This works with truncation for __int128_t, but what does it do for
 232     // our custom arithmetic types.
 233     return duration_cast<fs_seconds>().count();
 234   }
 235
 236
 237 Each of the above examples would require a user to adjust their filesystem code
 238 to the particular eccentricities of the representation, hopefully only in such
 239 a way that the code is still portable across implementations.
 240
 241 At least some of the above issues are unavoidable, no matter what
 242 representation we choose. But some representations may be quirkier than others,
 243 and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
 244 provides the least aberrant behavior.
 245
 246
 247 Chrono and ``timespec`` Emulation.
 248 ----------------------------------
 249
 250 One of the options we've considered is using something akin to ``timespec``
 251 to represent the ``file_time_type``. It only seems natural seeing as that's
 252 what the underlying system uses, and because it might allow us to match
 253 the range and resolution exactly. But would it work with chrono? And could
 254 it still act at all like a ``timespec`` struct?
 255
 256 For ease of consideration, let's consider what the implementation might
 257 look like.
 258
 259 .. code-block:: cpp
 260
 261   struct fs_timespec_rep {
 262     fs_timespec_rep(long long v)
 263       : tv_sec(v / nano::den), tv_nsec(v % nano::den)
 264     { }
 265   private:
 266     time_t tv_sec;
 267     long tv_nsec;
 268   };
 269   bool operator==(fs_timespec_rep, fs_timespec_rep);
 270   fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
 271   // ... arithmetic operators ... //
 272
 273 The first thing to notice is that we can't construct ``fs_timespec_rep`` like
 274 a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
 275 constructing it from a single 64 bit integer.
 276
 277 We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
 278 directly. A ``chrono::duration`` represents its value as a tick period and a
 279 number of ticks stored using ``rep``. The representation is unaware of the
 280 tick period it is being used to represent, but ``timespec`` is setup to assume
 281 a nanosecond tick period; which is the only case where the names ``tv_sec``
 282 and ``tv_nsec`` match the values they store.
 283
 284 When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
 285 use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
 286 remaining seconds. Let's consider how this might cause a bug were users allowed
 287 to manipulate the fields directly.
 288
 289 .. code-block:: cpp
 290
 291   template <class Period>
 292   timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
 293     fs_timespec_rep rep = dur.count();
 294     return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
 295   }
 296
 297   template <class Duration>
 298   Duration convert_to_duration(timespec ts) {
 299     Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
 300     return file_time_type(dur);
 301     file_time_type tp = last_write_time(p);
 302     auto dur =
 303   }
 304
 305   time_t extract_seconds(file_time_type tp) {
 306     // Converting to seconds is a silly bug, but I could see it happening.
 307     using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
 308     auto secs = duration_cast<Secs>(tp.time_since_epoch());
 309     // tv_sec is now representing gigaseconds.
 310     return secs.count().tv_sec; // Oops!
 311   }
 312
 313 Despite ``fs_timespec_rep`` not being usable in any manner resembling
 314 ``timespec``, it still might buy us our goal of matching its range exactly,
 315 right?
 316
 317 Sort of. Chrono provides a specialization point which specifies the minimum
 318 and maximum values for a custom representation. It looks like this:
 319
 320 .. code-block:: cpp
 321
 322   template <>
 323   struct duration_values<fs_timespec_rep> {
 324     static fs_timespec_rep zero();
 325     static fs_timespec_rep min();
 326     static fs_timespec_rep max() { // assume friendship.
 327       fs_timespec_rep val;
 328       val.tv_sec = numeric_limits<time_t>::max();
 329       val.tv_nsec = nano::den - 1;
 330       return val;
 331     }
 332   };
 333
 334 Notice that ``duration_values`` doesn't tell the representation what tick
 335 period it's actually representing. This would indeed correctly limit the range
 336 of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
 337 nanoseconds isn't the only tick period it will be used to represent. For
 338 example:
 339
 340 .. code-block:: cpp
 341
 342   void test() {
 343     using rep = file_time_type::rep;
 344     using fs_nsec = duration<rep, nano>;
 345     using fs_sec = duration<rep>;
 346     fs_nsec nsecs(fs_seconds::max()); // Truncates
 347   }
 348
 349 Though the above example may appear silly, I think it follows from the incorrect
 350 notion that using a ``timespec`` rep in chrono actually makes it act as if it
 351 were an actual ``timespec``.
 352
 353 Interactions with 32 bit ``time_t``
 354 -----------------------------------
 355
 356 Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
 357 about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
 358 for 32 bit builds).
 359
 360 When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
 361 ``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
 362 involved. And nor should we, as it would suffer from the numerous complications
 363 described by this paper.
 364
 365 Obviously our implementation for 32-bit builds should act as similarly to the
 366 64-bit build as possible. Code which compiles in one, should compile in the other.
 367 This consideration is important when choosing between ``__int128_t`` and
 368 emulating ``timespec``. The solution which provides the most uniformity with
 369 the least eccentricity is the preferable one.
 370
 371 Summary
 372 =======
 373
 374 The ``file_time_type`` time point is used to represent the write times for files.
 375 Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
 376 underlying filesystem uses the ``timespec`` struct for the same purpose.
 377
 378 However, the initial implementation of ``file_time_type`` could not represent
 379 either the range or resolution of ``timespec``, making it unsuitable. Fixing
 380 this requires an implementation which uses more than 64 bits to store the
 381 time point.
 382
 383 We primarily considered two solutions: Using ``__int128_t`` and using a
 384 arithmetic emulation of ``timespec``. Each has its pros and cons, and both
 385 come with more than one complication.
 386
 387 The Potential Solutions
 388 -----------------------
 389
 390 ``long long`` - The Status Quo
 391 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 392
 393 Pros:
 394
 395 * As a type ``long long`` plays the nicest with others:
 396
 397   * It works with streaming operators and other library entities which support
 398     builtin integer types, but don't support ``__int128_t``.
 399   * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
 400
 401 Cons:
 402
 403 * It cannot provide the same resolution as ``timespec`` unless we limit it
 404   to a range of +/- 300 years from the epoch.
 405 * It cannot provide the same range as ``timespec`` unless we limit its resolution
 406   to seconds.
 407 * ``last_write_time`` has to report an error when the time reported by the filesystem
 408   is unrepresentable.
 409
 410 __int128_t
 411 ~~~~~~~~~~~
 412
 413 Pros:
 414
 415 * It is an integer type.
 416 * It makes the implementation simple and efficient.
 417 * Acts exactly like other arithmetic types.
 418 * Can be implicitly converted to a builtin integer type by the user.
 419
 420   * This is important for doing things like:
 421
 422     .. code-block:: cpp
 423
 424       void c_interface_using_time_t(const char* p, time_t);
 425
 426       void foo(path p) {
 427         file_time_type tp = last_write_time(p);
 428         time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
 429         c_interface_using_time_t(p.c_str(), secs);
 430       }
 431
 432 Cons:
 433
 434 * It isn't always available (but on 64 bit machines, it normally is).
 435 * It causes ``file_time_type`` to have a larger range than ``timespec``.
 436 * It doesn't always act the same as other builtin integer types. For example
 437   with ``cout`` or ``to_string``.
 438 * Allows implicit truncation to 64 bit integers.
 439 * It can be implicitly converted to a builtin integer type by the user,
 440   truncating its value.
 441
 442 Arithmetic ``timespec`` Emulation
 443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 444
 445 Pros:
 446
 447 * It has the exact same range and resolution of ``timespec`` when representing
 448   a nanosecond tick period.
 449 * It's always available, unlike ``__int128_t``.
 450
 451 Cons:
 452
 453 * It has a larger range when representing any period longer than a nanosecond.
 454 * Doesn't actually allow users to use it like a ``timespec``.
 455 * The required representation of using ``tv_sec`` to store the giga tick count
 456   and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
 457   but complicates a lot.
 458 * It isn't a builtin integer type, and can't be used anything like one.
 459 * Chrono can be made to work with it, but not nicely.
 460 * Emulating arithmetic classes come with their own host of problems regarding
 461   overload resolution (Each operator needs three SFINAE constrained versions of
 462   it in order to act like builtin integer types).
 463 * It offers little over simply using ``__int128_t``.
 464 * It acts the most differently than implementations using an actual integer type,
 465   which has a high chance of breaking source compatibility.
 466
 467
 468 Selected Solution - Using ``__int128_t``
 469 =========================================
 470
 471 The solution I selected for libc++ is using ``__int128_t`` when available,
 472 and otherwise falling back to using ``long long`` with nanosecond precision.
 473
 474 When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
 475 provides same resolution and a greater range than ``timespec``. Otherwise
 476 it still provides the same resolution, but is limited to a range of +/- 300
 477 years. This final case should be rather rare, as ``__int128_t``
 478 is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
 479 during 32-bit builds.
 480
 481 Although falling back to ``long long`` and nanosecond precision is less than
 482 ideal, it also happens to be the implementation provided by both libstdc++
 483 and MSVC. (So that makes it better, right?)
 484
 485 Although the ``timespec`` emulation solution is feasible and would largely
 486 do what we want, it comes with too many complications, potential problems
 487 and discrepancies when compared to "normal" chrono time points and durations.
 488
 489 An emulation of a builtin arithmetic type using a class is never going to act
 490 exactly the same, and the difference will be felt by users. It's not reasonable
 491 to expect them to tolerate and work around these differences. And once
 492 we commit to an ABI it will be too late to change. Committing to this seems
 493 risky.
 494
 495 Therefore, ``__int128_t`` seems like the better solution.