docs/leapsmear.adoc

   1 = Leap Second Smearing with NTP
   2 include::include-html.ad[]
   3
   4 By Martin Burnicki
   5 with some edits by Harlan Stenn
   6
   7 The NTP software protocol and its reference implementation, ntpd, were
   8 originally designed to distribute UTC time over a network as accurately as
   9 possible.
  10
  11 Unfortunately, leap seconds are scheduled to be inserted into or deleted
  12 from the UTC time scale in irregular intervals to keep the UTC time scale
  13 synchronized with the Earth's rotation.  Deletions haven't happened, yet, but
  14 insertions have happened over 30 times.
  15
  16 The problem is that POSIX requires 86400 seconds in a day, and there is no
  17 prescribed way to handle leap seconds in POSIX.
  18
  19 Whenever a leap second is to be handled ntpd either:
  20
  21 * passes the leap second announcement down to the OS kernel (if the OS
  22 supports this) and the kernel handles the leap second automatically, or
  23
  24 * applies the leap second correction itself.
  25
  26 NTP servers also pass a leap second warning flag down to their clients via
  27 the normal NTP packet exchange, so clients also become aware of an
  28 approaching leap second, and can handle the leap second appropriately.
  29
  30
  31 == The Problem on Unix-like Systems
  32
  33 If a leap second is to be inserted, then in most Unix-like systems the OS
  34 kernel just steps the time back by 1 second at the beginning of the leap
  35 second, so the last second of the UTC day is repeated and thus duplicate
  36 timestamps can occur.
  37
  38 Unfortunately there are lots of applications which get confused if the
  39 system time is stepped back, e.g. due to a leap second insertion.  Thus,
  40 many users have been looking for ways to avoid this, and have tried to
  41 introduce workarounds which may or may not work properly.
  42
  43 So even though these Unix kernels normally can handle leap seconds, the way
  44 they do this is not always optimal for applications.
  45
  46 One good way to handle the leap second is to use ntp_gettime() instead of
  47 the usual calls, because ntp_gettime() includes a "clock state" variable
  48 that will actually tell you if the time you are receiving is OK or not, and
  49 if it is OK, if the current second is an in-progress leap second.  But even
  50 though this mechanism has been available for decades, almost
  51 nobody uses it.
  52
  53
  54 == The Leap Smear Approach
  55
  56 Due to the reasons mentioned above, some support for leap smearing has
  57 recently been implemented in ntpd.  This means that to insert a leap second
  58 an NTP server adds a certain increasing "smear" offset to the real UTC time
  59 sent to its clients, so that after some predefined interval the leap second
  60 offset is compensated.  The smear interval should be long enough,
  61 e.g. several hours, so that NTP clients can easily follow the clock drift
  62 caused by the smeared time.
  63
  64 During the period while the leap smear is being performed, ntpd will include
  65 a specially-formatted 'refid' in time packets that contain "smeared" time.
  66 This refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the
  67 smear value.
  68
  69 With this approach the time an NTP server sends to its clients still matches
  70 UTC before the leap second, up to the beginning of the smear interval, and
  71 again corresponds to UTC after the insertion of the leap second has
  72 finished, at the end of the smear interval.  By examining the first byte of
  73 the refid, one can also determine if the server is offering smeared time or
  74 not.
  75
  76 Of course, clients that receive the "smeared" time from an NTP server don't
  77 have to (and must not) care about the leap second anymore.  Smearing is
  78 transparent to the clients, and the clients don't even notice there's a
  79 leap second.
  80
  81
  82 == Pros and Cons of the Smearing Approach
  83
  84 The disadvantages of this approach are:
  85
  86 * During the smear interval the time provided by smearing NTP servers
  87 differs significantly from UTC, and thus from the time provided by normal,
  88 non-smearing NTP servers.  The difference can be up to 1 second, depending
  89 on the smear algorithm.
  90
  91 * Since smeared time differs from true UTC, and many applications require
  92 correct legal time (UTC), there may be legal consequences to using smeared
  93 time.  Make sure you check to see if this requirement affects you.
  94
  95 However, for applications where it's only important that all computers have
  96 the same time and a temporary offset of up to 1 s to UTC is acceptable, a
  97 better approach may be to slew the time in a well defined way, over a
  98 certain interval, thus "smearing" the leap second.
  99
 100
 101 == The Motivation to Implement Leap Smearing
 102
 103 Here is some historical background for ntpd, related to smearing/slewing
 104 time.
 105
 106 Up to ntpd 4.2.4, if kernel support for leap seconds was either not
 107 available or was not enabled, ntpd didn't care about the leap second at all.
 108 So if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a
 109 sudden 1 s offset after the leap second and normally would have stepped the
 110 time by -1 s a few minutes later.  However, 'ntpd -x' does not step the time
 111 but "slews" the 1-second correction, which takes 33 minutes and 20 seconds
 112 to complete.  This could be considered a bug, but certainly this was only an
 113 accidental behavior.
 114
 115 However, as we learned in the discussion in https://bugs.ntp.org/2745, this
 116 behavior was very much appreciated since indeed the time was never stepped
 117 back, even though the start of the slewing was not strictly defined and
 118 depended on the poll interval.  The system time was off by 1 second for
 119 several minutes before slewing even started.
 120
 121 In ntpd 4.2.6 some code was added which let ntpd step the time at UTC
 122 midnight to insert a leap second, if kernel support was not used.
 123 Unfortunately this also happened if ntpd was started with -x, so the folks
 124 who expected that the time was never stepped when ntpd was run with -x found
 125 this wasn't true anymore, and again from the discussion in NTP bug 2745 we
 126 learn that there were even some folks who patched ntpd to get the 4.2.4
 127 behavior back.
 128
 129 In 4.2.8 the leap second code was rewritten and some enhancements were
 130 introduced, but the resulting code still showed the behavior of 4.2.6,
 131 i.e. ntpd with -x would still step the time.  This has only recently been
 132 fixed in the current ntpd stable code, but this fix is only available with a
 133 certain patch level of ntpd 4.2.8.
 134
 135 So a possible solution for users who were looking for a way to bridge the
 136 leap second without the time being stepped could have been to check the
 137 version of ntpd installed on each of their systems.  If it's still 4.2.4 be
 138 sure to start the client ntpd with -x.  If it's 4.2.6 or 4.2.8 it won't work
 139 anyway except if you had a patched ntpd version instead of the original
 140 version.  So you'd need to upgrade to the current -stable code to be able to
 141 run ntpd with -x and get the desired result, so you'd still have the
 142 requirement to check/update/configure every single machine in your network
 143 that runs ntpd.
 144
 145 Google's leap smear approach is a very efficient solution for this, for
 146 sites that do not require correct timestamps for legal purposes.  You just
 147 have to take care that your NTP servers support leap smearing and configure
 148 those few servers accordingly.  If the smear interval is long enough so that
 149 NTP clients can follow the smeared time it doesn't matter at all which
 150 version of ntpd is installed on a client machine, it just works, and it even
 151 works around kernel bugs due to the leap second.
 152
 153 Since all clients follow the same smeared time the time difference between
 154 the clients during the smear interval is as small as possible, compared to
 155 the -x approach.  The current leap second code in ntpd determines the point
 156 in system time when the leap second is to be inserted, and given a
 157 particular smear interval it's easy to determine the start point of the
 158 smearing, and the smearing is finished when the leap second ends, i.e. the
 159 next UTC day begins.
 160
 161 The maximum error doesn't exceed what you'd get with the old smearing caused
 162 by -x in ntpd 4.2.4, so if users could accept the old behavior they would
 163 even accept the smearing at the server side.
 164
 165 In order to affect the local timekeeping as little as possible the leap
 166 smear support currently implemented in ntpd does not affect the internal
 167 system time at all.  Only the timestamps and refid in outgoing reply packets
 168 *to clients* are modified by the smear offset, so this makes sure the basic
 169 functionality of ntpd is not accidentally broken.  Also peer packets
 170 exchanged with other NTP servers are based on the real UTC system time and
 171 the normal refid, as usual.
 172
 173 The leap smear implementation is optionally available in ntp-4.2.8p3 and
 174 later, and the changes can be tracked via https://bugs.ntp.org/2855.
 175
 176 Please note that the above is historical, NTPSec forked from Classic
 177 after this point.
 178
 179 == Using NTP's Leap Second Smearing
 180
 181 * Leap Second Smearing MUST NOT be used for public servers, e.g. servers
 182 provided by metrology institutes, or servers participating in the NTP pool
 183 project.  There would be a high risk that NTP clients get the time from a
 184 mixture of smearing and non-smearing NTP servers which could result in
 185 undefined client behavior.  Instead, leap second smearing should only be
 186 configured on time servers providing dedicated clients with time, if all
 187 those clients can accept smeared time.
 188
 189 * Leap Second Smearing is NOT configured by default.  The only way to get
 190 this behavior is to invoke the +./waf configure+ script from the NTP source code
 191 package with the +--enable-leap-smear+ parameter before the executables are
 192 built.
 193
 194 * Even if ntpd has been compiled to enable leap smearing support, leap
 195 smearing is only done if explicitly configured.
 196
 197 * The leap smear interval should be at least several hours' long, and up to
 198 1 day (86400 s).  If the interval is too short then the applied smear offset
 199 is applied too quickly for clients to follow.  86400 s (1 day) is a good
 200 choice.
 201
 202 * If several NTP servers are set up for leap smearing then the *same* smear
 203 interval should be configured on each server.
 204
 205 * Smearing NTP servers DO NOT send a leap second warning flag to client time
 206 requests.  Since the leap second is applied gradually the clients don't even
 207 notice that there's a leap second being inserted, and thus there will be no log
 208 messages or similar related to the leap second visible on the clients.
 209
 210 * Since clients don't (and must not) become aware of the leap second at all,
 211 clients getting the time from a smearing NTP server MUST NOT be configured
 212 to use a leap second file.  If they have a leap second file they will apply
 213 the leap second twice: the smeared one from the server, plus another one
 214 inserted by themselves due to the leap second file.  As a result, the
 215 additional correction would soon be detected and corrected/adjusted.
 216
 217 * Clients MUST NOT be configured to poll both smearing and non-smearing NTP
 218 servers at the same time.  During the smear interval they would get
 219 different times from different servers and wouldn't know which server(s) to
 220 accept.
 221
 222 == Setting Up A Smearing NTP Server
 223
 224 If an NTP server should perform leap smearing then the leap smear interval
 225 (in seconds) needs to be specified in the NTP configuration file ntp.conf,
 226 e.g.:
 227
 228 --------------------------------
 229 leapsmearinterval 86400
 230 --------------------------------
 231
 232 Please keep in mind the leap smear interval should be between several and 24
 233 hours' long.  With shorter values clients may not be able to follow the
 234 drift caused by the smeared time, and with longer values the discrepancy
 235 between system time and UTC will cause more problems when reconciling
 236 timestamp differences.
 237
 238 A value of 86400 is what is implemented by
 239 https://developers.google.com/time/smear["Leap Smear"] and
 240 https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/["Look Before You Leap"] .
 241
 242
 243 When ntpd starts and a smear interval has been specified then a log message
 244 is generated, e.g.:
 245
 246 ----------------------------------------------------------------
 247 ntpd[31120]: config: leap smear interval 86400 s
 248 ----------------------------------------------------------------
 249
 250 While ntpd is running with a leap smear interval specified the command:
 251
 252 --------------------------------
 253 ntpq -c rv
 254 --------------------------------
 255
 256 reports the smear status, e.g.:
 257
 258 --------------------------------
 259 # ntpq -c rv
 260 associd=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed,
 261 version="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)",
 262 processor="i586", system="Linux/3.7.1", leap=01, stratum=1,
 263 precision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS,
 264 reftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036,
 265 clock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335,
 266 tc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815,
 267 clk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000,
 268 expire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087
 269 --------------------------------
 270
 271 In the example above 'leapsmearinterval' reports the configured leap smear
 272 interval all the time, while the 'leapsmearoffset' value is 0 outside the
 273 interval and increases from 0 to -1000 ms over the interval.  So this can be
 274 used to monitor if and how the time sent to clients is smeared.  With a
 275 leapsmearoffset of -.932087, the refid reported in smeared packets would be
 276 254.196.88.176.
 277
 278 '''''
 279
 280 include::includes/footer.adoc[]