vcl/README.scheduler

   1 = Introduction =
   2
   3 The VCL scheduler handles LOs primary event queue. It is simple by design,
   4 currently just a single-linked list, processed in list-order by priority
   5 using round-robin for reoccurring tasks.
   6
   7 The scheduler has the following behaviour:
   8
   9 B.1. Tasks are scheduled just priority based
  10 B.2. Implicitly cooperative AKA non-preemptive
  11 B.3. It's not "fair" in any way (a consequence of B.2)
  12 B.4. Tasks are handled round-robin (per priority)
  13 B.5. Higher priorities have lower values
  14 B.6. A small set of priorities instead of an flexible value AKA int
  15
  16 There are some consequences due to this design.
  17
  18 C.1. Higher priority tasks starve lower priority tasks
  19      As long as a higher task is available, lower tasks are never run!
  20      See Anti-pattern.
  21
  22 C.2. Tasks should be split into sensible blocks
  23      If this can't really be done, process pending tasks by calling
  24      Application::Reschedule(). Or use a thread.
  25
  26 C.3. This is not an OS scheduler
  27      There is no real way to "fix" B.2. and B.3.
  28      If you need to do a preemptive task, use a thread!
  29      Otherwise make your task suspendable.
  30
  31
  32 = Driving the scheduler AKA the system timer =
  33
  34   1. There is just one system timer, which drives LO event loop
  35   2. The timer has to run in the main window thread
  36   3. The scheduler is run with the Solar mutex acquired
  37   4. The system timer is a single-shot timer
  38   5. The scheduler system event / message has a low system priority.
  39      All system events should have a higher priority.
  40
  41 Every time a task is started, the scheduler timer is adjusted. When the timer
  42 fires, it posts an event to the system message queue. If the next most
  43 important task is an Idle (AKA instant, 0ms timeout), the event is pushed to
  44 the back of the queue, so we don't starve system messages, otherwise to the
  45 front.
  46
  47 Every time the scheduler is invoked it searches for the next task to process,
  48 restarts the timer with the timeout for the next event and then invokes the
  49 task. After invoking the task and if the task is still active, it is pushed
  50 to the end of the queue and the timeout is eventually adjusted.
  51
  52
  53 = Locking =
  54
  55 The locking is quite primitive: all interaction with internal Scheduler
  56 structures are locked. This includes the ImplSchedulerContext and the
  57 Task::mpSchedulerData, which is actually a part of the scheduler.
  58 Before invoking the task, we have to release the lock, so others can
  59 Start new Tasks.
  60
  61 The Scheduler just processes its own Tasks in the main thread and needs
  62 the SolarMutex for it and for DeInit (tested by DBG_TESTSOLARMUTEX). All
  63 the other interaction just take the scheduler mutex or don't need locking
  64 at all.
  65
  66 There is a "workaround" for static Task objects, which would crash LO on
  67 destruction, because Task::~Task would try to de-register itself in the
  68 Scheduler, while the SchedulerLock would be long gone. OTOH this makes
  69 Task handling less error-prone, than doing "special" manual cleanup.
  70 There is also a "= TODOs and ideas =" to get rid if static Tasks.
  71
  72 Actually the scheduler mutex should never be locked when calling into
  73 non-scheduler code, so it was converted to a non-recursive
  74 std::mutex.
  75
  76
  77 = Idle processing =
  78
  79 Confusingly, there are 2 concepts that are called 'idle':
  80
  81 * Instant (zero timeout) tasks, represented e.g. by the Idle class. This is
  82 a misnomer, as these tasks are processed after returning to the main loop.
  83 This is not necessarily when LO is idle, in fact such tasks may be invoked
  84 while there is input in the OS event queue pending.
  85 (TODO: This case should be fixed by renaming.)
  86
  87 * Low priority tasks, represented by priorities TaskPriority::HIGH_IDLE and lower.
  88 In addition to being invoked only when there is no task with a higher priority,
  89 pending input in the OS event queue also takes precedence.
  90
  91
  92 = Lifecycle / thread-safety of Scheduler-based objects =
  93
  94 A scheduler object it thread-safe in the way, that it can be associated to
  95 any thread and any thread is free to call any functions on it. The owner must
  96 guarantee that the Invoke() function can be called, while the Scheduler object
  97 exists / is not disposed.
  98
  99
 100 = Anti-pattern: Dependencies via (fine grained) priorities =
 101
 102 "Idle 1" should run before "Idle 2", therefore give "Idle 1" a higher priority
 103 then "Idle 2". This just works correct for low frequency idles, but otherwise
 104 always breaks!
 105
 106 If you have some longer work - even if it can be split by into schedulable,
 107 smaller blocks - you normally don't want to schedule it with a non-default
 108 priority, as it starves all lower priority tasks. Even if a block was processed
 109 in "Idle 1", it is scheduled with the same (higher) priority again. Changing
 110 the "Idle" to a "Timer" also won't work, as this breaks the dependency.
 111
 112 What is needed is task based dependency handling, so if "Task 1" is done, it
 113 has to start "Task 2" and if "Task 1" is started again, it has to stop
 114 "Task 2". This currently has to be done by the implementor, but this feature
 115 can be added to the scheduler reasonably.
 116
 117
 118 = Implementation details =
 119
 120 == General: event priority for DoYield ==
 121
 122 There are three types of events, with different priority:
 123
 124 1. LO user events
 125 2. System events
 126 3. LO Scheduler event
 127
 128 They should be processed according to the following code:
 129
 130 bool DoYield( bool bWait, bool bAllCurrent )
 131 {
 132     bool bWasEvent = ProcessUserEvents( bAllCurrent );
 133     if ( !bAllCurrent && bWasEvent )
 134         return true;
 135     bWasEvent = ProcessSystemEvents( bAllCurrent, &bWasSchedulerEvent ) || bWasEvent;
 136     if ( !bWasSchedulerEvent && IsSchedulerEvent() )
 137     {
 138         ProcessSchedulerEvent()
 139         bWasEvent = true;
 140     }
 141     if ( !bWasEvent && bWait )
 142     {
 143         WaitForSystemEvents();
 144         bWasEvent = true;
 145     }
 146     return bWasEvent;
 147 }
 148
 149 == General: main thread deferral ==
 150
 151 In almost all VCL backends, we run main thread deferrals by disabling the
 152 SolarMutex using a boolean. In the case of the redirect, this makes
 153 tryToAcquire and doAcquire return true or 1, while a release is ignored.
 154 Also the IsCurrentThread() mutex check function will act accordingly, so all
 155 the DBG_TESTSOLARMUTEX won't fail.
 156
 157 Since we just disable the locks when we start running the deferred code in the
 158 main thread, we won't let the main thread run into stuff, where it would
 159 normally wait for the SolarMutex.
 160
 161 Eventually this will move into the SolarMutex. KDE / Qt also does main
 162 thread redirects using Qt::BlockingQueuedConnection.
 163
 164 == General: processing all current events for DoYield ==
 165
 166 This is easily implemented for all non-priority queue based implementations.
 167 Windows and macOS both have a timestamp attached to their events / messages,
 168 so simply get the current time and just process anything < timestamp.
 169 For the KDE backend this is already the default behaviour - single event
 170 processing isn't even supported. The headless backend accomplishes this by
 171 just processing a copy of the list of current events.
 172
 173 Problematic in this regard is the Gtk+ backend. g_main_context_iteration
 174 dispatches "only those highest priority event sources". There is no real way
 175 to tell, when these became ready. I've added a workaround idea to the TODO
 176 list. FWIW: Qt runs just a single timer source in the glib main context,
 177 basically the same we're doing with the LO scheduler as a system event.
 178
 179 The gen X11 backend has some levels of redirection, but needs quite some work
 180 to get this fixed.
 181
 182 == General: non-main thread yield ==
 183
 184 Yielding from a non-main thread must not wait in the main thread, as this
 185 may block the main thread until some events happen.
 186
 187 Currently we wait on an extra conditional, which is cleared by the main event
 188 loop.
 189
 190 == General: invalidation of elapsed timer event messages ==
 191
 192 Since the system timer to run the scheduler is single-shot, there should never
 193 be more than one elapsed timer event in system event queue. When stopping or
 194 restarting the timer, we eventually have to remove the now invalid event from
 195 the queue.
 196
 197 But for the Windows and macOS backends this may fail as they have delayed
 198 posting of events, so a consecutive remove after a post will actually yield no
 199 remove. On Windows we even get unwanted processing of events outside of the
 200 main event loop, which may call the Scheduler, as timer management is handled
 201 in critical scheduler code.
 202
 203 To prevent these problems, we don't even try to remove these events, but
 204 invalidate them by versioning the timer events. Timer events with invalid
 205 versions are processed but simply don't run the scheduler.
 206
 207 == General: track time of long running tasks ==
 208
 209 There is TaskStopwatch class. It'll track the time and report a timeout either
 210 when the tasks time slice is finished or some system event did occur.
 211
 212 Eventually it will be merged into the main scheduler, so each invoked task can
 213 easily track it's runtime and eventually this can be used to "blame" / find
 214 other long running tasks, so interactivity can be improved.
 215
 216 There were some questions coming up when implementing it:
 217
 218 === Why does the scheduler not detect that we only have idle tasks pending,
 219 and skip the instant timeout? ===
 220
 221 You never know how long a task will run. Currently the scheduler simply asks
 222 each task when it'll be ready to run, until two runnable tasks are found.
 223 Normally this is very quick, as LO has a lot of one-shot instant tasks / Idles
 224 and just a very few long term pending Timers.
 225
 226 Especially UNO calls add a lot of Idles to the task list, which just need to
 227 be processed in order.
 228
 229 === Why not use things like Linux timer wheels? ===
 230
 231 LO has relatively few timers and a lot one-shot Idles. 99% of time the search
 232 for the next task is quick, because there are just ~5 long term timers per
 233 document (cache invalidation, cursor blinking etc.).
 234
 235 This might become a problem, if you have a lot of open documents, so the long
 236 term timer list increases AKA for highly loaded LOOL instances.
 237
 238 But the Linux timer wheel mainly relies on the facts that the OS timers are
 239 expected to not expire, as they are use to catch "error" timeouts, which rarely
 240 happen, so this definitely not matches LO's usage.
 241
 242 === Not really usable to find misbehaving tasks ===
 243
 244 The TaskStopwatch class is just a little time keeper + detecting of input
 245 events. This is not about misbehaving Tasks, but long running tasks, which
 246 have to yield to the Scheduler, so other Tasks and System events can be
 247 processed.
 248
 249 There is the TODO to merge the functionality into the Scheduler itself, at
 250 which point we can think about profiling individual Tasks to improve
 251 interactivity.
 252
 253 == macOS implementation details ==
 254
 255 Generally the Scheduler is handled as expected, except on resize, which is
 256 handled with different runloop-modes in macOS. In case of a resize, the normal
 257 runloop is suspended in sendEvent, so we can't call the scheduler via posted
 258 main loop-events. Instead the scheduler uses the timer again.
 259
 260 Like the Windows backend, all Cocoa / GUI handling also has to be run in
 261 the main thread. We're emulating Windows out-of-order PeekMessage processing,
 262 via a YieldWakeupEvent and two conditionals. When in a RUNINMAIN call, all
 263 the DBG_TESTSOLARMUTEX calls are disabled, as we can't release the SolarMutex,
 264 but we can prevent running any other SolarMutex based code. Those wakeup
 265 events must be ignored to prevent busy-locks. For more info read the "General:
 266 main thread deferral" section.
 267
 268 We can neither rely on macOS dispatch_sync code block execution nor the
 269 message handling, as both can't be prioritized or filtered and the first
 270 does also not allow nested execution and is just processed in sequence.
 271
 272 There is also a workaround for a problem for pushing tasks to an empty queue,
 273 as [NSApp postEvent: ... atStart: NO] doesn't append the event, if the
 274 message queue is empty.
 275
 276 An additional problem is the filtering of events on Window close. This drops
 277 posted timer events, when a Window is closed resulting in a busy DoYield loop,
 278 so we have to re-post the event, after closing a window.
 279
 280 == Windows implementation details ==
 281
 282 Posted or sent event messages often trigger processing of WndProc in
 283 PeekMessage, GetMessage or DispatchMessage, independently from the message to
 284 fetch, remove or dispatch ("During this call, the system delivers pending,
 285 nonqueued messages..."). Additionally messages have an inherited priority
 286 based on the function used to generate them. Even if WM_TIMER messages should
 287 have the lowest priority, a manually posted WM_TIMER is processed with the
 288 priority of a PostMessage message.
 289
 290 So we're giving up on processing all our Scheduler events as a message in the
 291 system message loop. Instead we just indicate a 0ms timer message by setting
 292 the m_bDirectTimeout in the timer object. This timer is always processed, if
 293 the system message wasn't already our timer. As a result we can also skip the
 294 polling. All this is one more reason to drop the single message processing
 295 in favour of always processing all pending (system) events.
 296
 297 There is another special case, we have to handle: window updates during move
 298 and resize of windows. These system actions run in their own nested message
 299 loop. So we have to completely switch to timers, even for 0ms. But these
 300 posted events prevent any event processing, while we're busy. The only viable
 301 solution seems to be to switch to WM_TIMER based timers, as these generate
 302 messages with the lowest system priority (but they don't allow 0ms timeouts).
 303 So processing slows down during resize and move, but we gain working painting,
 304 even when busy.
 305
 306 An additional workaround is implemented for the delayed queuing of posted
 307 messages, where PeekMessage in WinSalTimer::Stop() won't be able remove the
 308 just posted timer callback message. See "General: invalidation of elapsed
 309 timer event messages" for the details.
 310
 311 To run the required GUI code in the main thread without unlocking the
 312 SolarMutex, we "disable" it. For more infos read the "General: main thread
 313 deferral" section.
 314
 315 == KDE implementation details ==
 316
 317 This implementation also works as intended. But there is a different Yield
 318 handling, because Qts QAbstractEventDispatcher::processEvents will always
 319 process all pending events.
 320
 321
 322 = TODOs and ideas =
 323
 324 == Task dependencies AKA children ==
 325
 326 Every task can have a list of children / a child.
 327
 328  * When a task is stopped, the children are started.
 329  * When a task is started, the children are stopped.
 330
 331 This should be easy to implement.
 332
 333 == Per priority time-sorted queues ==
 334
 335 This would result in O(1) scheduler. It was used in the Linux kernel for some
 336 time (search Ingo Molnar's O(1) scheduler). This can be a scheduling
 337 optimization, which would prevent walking longer event list. But probably the
 338 management overhead would be too large, as we have many one-shot events.
 339
 340 To find the next task the scheduler just walks the (constant) list of priority
 341 queues and schedules the first ready event of any queue.
 342
 343 The downside of this approach: Insert / Start / Reschedule(for "auto" tasks)
 344 now need O(log(n)) to find the position in the queue of the priority.
 345
 346 == Always process all (higher priority) pending events ==
 347
 348 Currently Application::Reschedule() processes a single event or "all" events,
 349 with "all" defined as "100 events" in most backends. This already is ignored
 350 by the KDE backend, as Qt defines its QAbstractEventDispatcher::processEvents
 351 processing all pending events (there are ways to skip event classes, but no
 352 easy way to process just a single event).
 353
 354 Since the Scheduler is always handled by the system message queue, there is
 355 really no more reasoning to stop after 100 events to prevent LO Scheduler
 356 starvation.
 357
 358 == Drop static inherited or composed Task objects ==
 359
 360 The sequence of destruction of static objects is not defined. So a static Task
 361 can not be guaranteed to happen before the Scheduler. When dynamic unloading
 362 is involved, this becomes an even worse problem. This way we could drop the
 363 mbStatic workaround from the Task class.
 364
 365 == Run the LO application in its own thread ==
 366
 367 This would probably get rid of most of the macOS and Windows implementation
 368 details / workarounds, but is quite probably a large amount of work.
 369
 370 Instead of LO running in the main process / thread, we run it in a 2nd thread
 371 and defer al GUI calls to the main thread. This way it'll hopefully not block
 372 and can process system events.
 373
 374 That's just a theory - it definitely needs more analysis before even attending
 375 an implementation.
 376
 377 == Re-evaluate the macOS ImplNSAppPostEvent ==
 378
 379 Probably a solution comparable to the Windows backends delayed PostMessage
 380 workaround using a validation timestamp is better then the current peek,
 381 remove, re-postEvent, which has to run in the main thread.
 382
 383 Originally I didn't evaluate, if the event is actually lost or just delayed.
 384
 385 == Drop nMaxEvents from Gtk+ based backends ==
 386
 387 gint last_priority = G_MAXINT;
 388 bool bWasEvent = false;
 389 do {
 390     gint max_priority;
 391     g_main_context_acquire( NULL );
 392     bool bHasPending = g_main_context_prepare( NULL, &max_priority );
 393     g_main_context_release( NULL );
 394     if ( bHasPending )
 395     {
 396         if ( last_priority > max_priority )
 397         {
 398             bHasPending = g_main_context_iteration( NULL, bWait );
 399             bWasEvent = bWasEvent || bHasPending;
 400         }
 401         else
 402             bHasPending = false;
 403     }
 404 }
 405 while ( bHasPending )
 406
 407 The idea is to use g_main_context_prepare and keep the max_priority as an
 408 indicator. We cannot prevent running newer lower events, but we can prevent
 409 running new higher events, which should be sufficient for most stuff.
 410
 411 This also touches user event processing, which currently runs as a high
 412 priority idle in the event loop.
 413
 414 == Drop nMaxEvents from gen (X11) backend ==
 415
 416 A few layers of indirection make this code hard to follow. The SalXLib::Yield
 417 and SalX11Display::Yield architecture makes it impossible to process just the
 418 current events. This really needs a refactoring and rearchitecture step, which
 419 will also affect the Gtk+ and KDE backend for the user event handling.
 420
 421 == Merge TaskStopwatch functionality into the Scheduler ==
 422
 423 This way it can be easier used to profile Tasks, eventually to improve LO's
 424 interactivity.