1 JobQueue Architecture {#jobqueuearch}
3 Notes on the Job queuing system architecture.
7 The data model consists of the following main components:
8 * The JobSpecification class represents a job type and associated parameters
9 that can be enqueued via a JobQueue or JobQueueGroup without needing to construct
10 the full PHP class associated with the given job type.
11 * The Job class represents a particular deferred task that happens in the
12 background. All jobs subclass the Job class and put the main logic in the
13 function called run().
14 * The JobQueue class represents a particular queue of jobs of a certain type.
15 For example there may be a queue for email jobs and a queue for CDN purge
17 * The JobQueueGroup class represents all job queues for a given wiki
18 and provides helper methods to enqueue or dequeue jobs on that wiki
19 without the caller needing to be aware of type-specific queue configuration.
20 The `JobQueueGroup` service offers a convenience JobQueueGroup instance
21 for the common case of dealing with jobs in the context of the local wiki.
22 * The JobQueueGroupFactory class manages per-wiki JobQueueGroup objects.
26 Each job type has its own queue and is associated to a storage medium. One
27 queue might save its jobs in Redis, while another one uses would use a database.
29 Storage mediums are defined in a JobQueue subclass. Before using it, you must
30 define in $wgJobTypeConf a mapping of the job type to the given JobQueue subclass.
32 The following core queue classes are available:
33 * JobQueueDB (stores jobs in the `job` table in a database)
34 * JobQueueRedis (stores jobs in a redis server)
36 All queue classes support some basic operations (though some may be no-ops):
37 * enqueueing a batch of jobs
38 * dequeueing a single job
39 * acknowledging a job is completed
40 * checking if the queue is empty
42 All queue implementations must offer at-least-once execution guarantees for enqueued jobs.
43 The execution order of enqueued jobs may however vary depending on the implementation,
44 so callers should not assume any particular execution order.
46 ## Job queue aggregator
48 Since each job type has its own queue, and wiki-farms may have many wikis,
49 there might be a large number of queues to keep track of. To avoid wasting
50 large amounts of time polling empty queues, aggregators exists to keep track
51 of which queues are ready.
53 The following queue aggregator classes are available:
54 * JobQueueAggregatorRedis (uses a redis server to track ready queues)
56 Some aggregators cache data for a few minutes while others may be always up to date.
57 This can be an important factor for jobs that need a low pickup time (or latency).
60 The high-level job execution flow for a queue consists of the following steps:
62 1. Dequeue a job specification (type and corresponding parameters) from the corresponding storage medium.
63 2. Deduplicate the job according to deduplication rules (optional).
64 3. Marshal the job into the corresponding Job subclass and run it via Job::execute().
65 4. Run Job::tearDown().
66 5. If the Job failed (as described below), attempt to retry it up to the configured retry limit.
68 An exception thrown by Job::run(), or Job::run() returning `false`, will cause
69 the job runner to retry the job up to the configured retry limit, unless Job::allowRetries() returns `false`.
70 As of MediaWiki 1.43, no job runner implementation makes a distinction between transient errors
71 (which are retry-safe) and non-transient errors (which are not retry-safe).
72 A Job implementation that is expected to have both transient and non-transient error states
73 should therefore catch and process non-transient errors internally and return `true`
74 from Job::run() in such cases, to reduce the incidence of unwanted retries for such errors
75 while still benefiting from the automated retry logic for transient errors.
77 Note that in a distributed job runner implementation, the above steps
78 may be split between different infrastructure components, as is the case with
79 the changeprop-based system used by Wikimedia Foundation. This may require
80 additional configuration than overriding Job::allowRetries() to ensure that
81 other job runner components do not attempt to retry a job that is not retry-safe (T358939).
83 Since job runner implementations may vary in reliability, job classes should be
84 idempotent, to maintain correctness even if the job happens to run more than once.
87 A Job subclass may override Job::getDeduplicationInfo() and Job::ignoreDuplicates() to allow jobs to be deduplicated if the job runner in use supports it.
89 If Job::ignoreDuplicates() returns `true`, the deduplication logic must consider the job to be a duplicate if a Job of the same type with identical deduplication info has been executed later than the enqueue timestamp of the job.
91 Jobs that spawn many smaller jobs (so-called "root" and "leaf" jobs) may enable additional deduplication logic,
92 to make in-flight leaf-jobs no-ops, when a newer root job with identical parameters gets enqueued.
93 This is done by passing two special parameters, `rootJobTimestamp` and `rootJobSignature`,
94 which hold the MediaWiki timestamp at which the root job was enqueued, and an SHA-1 checksum uniquely identifying the root job, respectively.
95 The Job::newRootJobParams() convenience method facilitates adding these parameters to a preexisting parameter set.
96 When deduplicating leaf jobs, the job runner must consider a leaf job to be a duplicate
97 if a root job with an identical signature has been executed by the runner later than the
98 `rootJobTimestamp` of the leaf job.
101 For enqueueing jobs, JobQueue and JobQueueGroup offer the JobQueue::push() and
102 JobQueue::lazyPush() methods. The former synchronously enqueues the job and propagates
103 a JobQueueError exception to the caller in case of failure, while the latter defers enqueueing
104 the job when running a web request context until after the response has been flushed to the client.
105 Callers should prefer using `lazyPush` unless it is necessary to surface enqueue failures.