1 # Copyright (C) 2001-2010, Parrot Foundation.
4 =head1 PDD 25: Concurrency
8 This document defines the requirements and implementation strategy for
9 Parrot's concurrency models.
19 =item - Parrot supports multiple concurrency models, including POSIX
20 threads, event-based programming, and asynchronous I/O.
22 =item - A concurrency scheduler manages all concurrent tasks
24 =item - Each interpreter has its own concurrency scheduler
26 =item - Concurrency schedulers for different interpreters
27 communicate and can share tasks
29 =item - A concurrency scheduler may link to other schedulers as a
30 parent, a child, or an equal
32 =item - A task is a concurrent unit of work
34 =item - All tasks support a standard interface used by the concurrency
35 scheduler, but otherwise have a great deal of flexibility in their
38 =item - Tasks can share PMC variables
44 Concurrency is a parallel execution of units of code (on multiprocessor
45 machines), or a flexible ordering of serial units of code (on single processor
46 machines). For certain problem spaces, concurrency offers significant speed
47 gains by parceling out processor-intensive activity or by ensuring that a wait
48 for input or system resources doesn't hold up the entire application.
50 A task is a unit of code that can be executed concurrently.
54 Rather than defining a single canonical threading model, Parrot defines
55 an infrastructure that supports multiple concurrency models and provides
56 for interaction between the various models. Parrot already uses multiple
57 concurrency models for events, threads, async I/O, and exceptions, a
58 trend that will only continue as we support multiple HLLs and external
59 threading libraries like Intel's Threading Building Blocks. Designing
60 for multiple concurrency models also gives Parrot more room to grow as
61 future models are researched and developed.
63 To avoid conflicts between concurrency models, Parrot provides a single
64 central concurrency scheduler for each interpreter instance. Each
65 concurrency model defines a Task PMC that supports a standard minimal
66 interface. The scheduler can interact with tasks from different models
67 without direct access to the details of each model.
69 On multiprocessor systems, the scheduler is responsible for allocating
70 tasks to processors, or for delegating that allocation to the underlying
73 For the most part, when we talk about concurrency, we mean concurrency
74 across an interpreter pool. An interpreter pool is a set of interpreter
75 instances that share common resources: the memory pools, arenas, and
76 global namespace--pretty much everything except what's in the
77 interpreter structure itself. They're essentially threads in the OS
80 Another form of concurrency is between completely independent
81 interpreter instances, each with their own memory pools, arenas,
82 namespaces, etc. Independent interpreters may run as separate processes
83 on the same machine, or even as separate processes on different machines
84 (in a clustering environment, for example). The concerns of
85 shared-interpreter concurrency and independent-interpreter concurrency
86 are similar, and in Parrot both use the same central concurrency
87 scheduler. This PDD doesn't directly address independent-interpreter
88 concurrency, but does include occasional notes on how it integrates with
89 shared-interpreter concurrency.
91 =head3 Supported Concurrency Models
93 The following are a few of the concurrency models Parrot intends to
94 support. The biggest differences between them are in how they handle
95 variables shared across concurrent tasks. But the design is such that
96 each of the different models can run simultaneously, coordinated through the
97 central concurrency scheduler.
99 =head4 Mutex/Lock Concurrency
101 In this model, before performing an operation on a shared variable, you
102 must first acquire a lock on it. Once a lock is acquired, any other
103 concurrent tasks that attempt to acquire a lock on that variable will
104 block until the existing lock is released.
106 A mutex is a thing that can be locked. They are not directly exposed to
107 users. They're non-recursive, non-read/write, exclusive things. When a
108 concurrent task gets a mutex, any other attempt to get that mutex will
109 block until the owning task releases the mutex. Mutexes are implemented
110 using the platform-native lock construct.
112 The first thing that any vtable function of a shared PMC must do is to
113 acquire the mutex of the PMCs in its parameter list (in ascending
114 address order). In this model only PMCs can be shared.
116 =head4 STM Concurrency
118 Parrot's preferred model of concurrency is based on Software
119 Transactional Memory. In this model, rather than locking a shared
120 variable while performing a series of operations on it, the changes are
121 bundled into a transaction that acts as an atomic unit.
123 Within the transaction, STM creates a "hypothetical" copy of the
124 variable, logs the changes made to that copy, and at the end of the
125 transaction performs some validation steps to decide whether to save the
126 hypothetical value back to the real variable (a commit) or discard the
127 hypothetical value (a roll back). One common validation step is to check
128 whether the value of the real variable was changed during the execution
129 of the transaction (possibly by another concurrent task).
131 STM tasks can read/write shared variables from mutex/lock tasks, as they
132 appear to the mutex/lock task as a single atomic operation. Mutex/lock
133 tasks can read shared variables from STM tasks, but they cannot write
134 them, as the STM tasks will not respect the lock and may commit a new
135 value in the middle of a complex operation that requires the lock. As a
136 safety mode, STM tasks may be configured to fail validation on any
137 transaction attempting to commit to a variable locked by a mutex/lock
140 =head4 POSIX Concurrency
142 This is the POSIX "share-everything" style of threading, such as is used
143 in Perl 5's "pthread" model, as well as the thread models for Ruby and
144 Python. [Recommended reading: "Programming with POSIX Threads" by Dave
147 =head4 Process-type Concurrency
149 This is the Perl 5 "iThreads" threading model. In this model no data is
150 shared implicitly, and all sharing must be done on purpose and
151 explicitly. It resembles the Unix
152 fork-process-with-shared-memory-segment model, not a surprise as it was
153 originally developed with emulation of Unix's fork system in mind.
155 =head4 Independent Concurrency
157 Independent tasks have no contact with the internal data of any other
158 task in the current process. These are implemented as STM concurrency
159 but only use transactions for the shared interpreter globals.
161 Note that independent tasks may still communicate back and forth by
162 passing either atomic things (ints, floats, and pointers) or static
163 buffers that can become the property of the destination thread.
165 =head4 Intel Threading Building Blocks
167 Threading Building Blocks (TBB) is a library of tools for data-parallel
168 programming, dividing large data sets into small pieces so that
169 operations on those data-sets can be parallelized across multiple
172 Parrot will provide two levels of integration with TBB: an interface for
173 TBB's scheduling to interact with the central concurrency scheduler, and
174 an interface for developers to access the TBB routines from within
177 Like Parrot, TBB is task-based. Since TBB performs its own scheduling,
178 TBB tasks in Parrot will be given a lightweight scheduler that only has
179 the responsibility of passing messages, events, etc, back and forth
180 between the TBB task and the central scheduler. TBB tasks will not share
181 variables with any other types of concurrent tasks in Parrot.
183 Note that since TBB is a C++ library, it is only available when Parrot
184 is compiled with a C++ compiler.
186 =head3 Concurrency Scheduler API
188 The concurrency scheduler has two parts, a Scheduler PMC, which has an
189 instance stored in the interpreter struct, and a set of core routines in
192 An instance of the Scheduler PMC has 5 internal attributes, which are:
198 An unique ID for the scheduler
202 The current highest assigned task ID
210 The task priority index
218 The unique ID of the scheduler is used by other schedulers to pass messages.
219 With a small set of identifying information (including process ID, interpreter
220 ID, scheduler ID, and possibly a URL/hostname) a scheduler can address other
221 schedulers, both local to the current interpreter and remote.
223 The task list is a simple unordered integer indexed data structure, currently
224 implemented as a hash. Each task in the list has an integer ID assigned when
225 it is first inserted into the list. A task retains the same ID throughout its
226 lifetime, and the ID is not reused once a task is finalized. (The data
227 structure is currently implemented as a hash so that it only takes up the
228 memory required for currently living tasks. A task list for a particular
229 program may use thousands of task IDs, but only need memory allocated for a
230 handful of elements at any given moment.)
232 The task rank index is calculated based on the type, priority rating, age of
233 the tasks in the task list. The index is a simple array, and in general the
234 top (zeroth) element in the array is the next one to receive attention. The
235 index is recalculated regularly as new tasks are inserted into the task list,
236 existing tasks are modified or completed, and as time progresses so the age of
237 some tasks pushes them to a higher priority. Because of the regular
238 recalculation, the rank index may cache some frequently-accessed and rarely
239 changing data from the tasks (though it is not required to do so). (As a later
240 optimization, some data structure other than an array may be used to speed up
241 rank recalculation. For example, with a hash of hashes of arrays keyed on task
242 attributes, the process of inserting new tasks at a relative priority to other
243 existing tasks could be performed without shifting the rank of all lower
246 The list of handlers is a simple stack of handler PMCs currently waiting for
247 an appropriate task (event, exception). See PDD 24 on Events for more details
252 PMC flags 0-7 are reserved for private use by a PMC. The scheduler uses flag 0
253 to indicate whether the priority index is currently valid or needs to be
254 recalculated before the next use.
256 =head4 Vtable Functions
262 Add an entry to the task list.
266 Pull the next entry (the highest ranked entry in the task priority index) off
267 the task list. If there are no tasks remaining in the task list, return null.
279 $P1.'add_handler'($P2)
283 Add an event or exception handler to the scheduler's list of handlers.
289 $P1 = $P2.'find_handler'($P3)
293 Search for an event or exception handler $P1, in scheduler $P2, for the task
294 $P3. Returns a null PMC if an appropriate handler is not found.
300 The interface of the Task PMC is also the minimum required interface for all
301 subclasses, extensions, and alternate implementations of a task.
303 An instance of the Task PMC has 7 internal attributes, which are:
317 The subtype of the task
321 The priority of the task
325 The birthtime stamp of the task
329 The status of the task
333 The code block of the task (optional)
337 An interpreter structure for the task (optional)
341 Types of tasks include 'event', 'exception', 'io', and 'code'. The subtype of
342 a task is used by events and exceptions to identify appropriate handlers.
343 Possible status values for tasks include 'created', 'invoked', 'inprocess',
344 and 'completed'. The final state of a task is 'destroyed', but is never
345 marked (the task PMC is removed from the task list and at some later point
346 destroyed by GC). The priority of a task is an integer value between 0 and
347 100, with 0 as the lowest priority.
349 The birthtime stamp is the point at which the task was inserted into the task
350 list, and is used for calculating the age of tasks.
352 The code block is optional and only for tasks that are associated with a
353 simple code block. The interpreter structure is also optional and only used
354 for thread-like tasks that maintain their own interpreter state.
356 =head4 Vtable Functions
362 Retrieve an attribute of the task.
366 Set an attribute of the task.
382 Creates a new task. (The Scheduler PMC is never instantiated directly, it is
383 only used by Parrot internals.)
395 Register a task with the concurrency scheduler. Details about the task are
396 stored within the task PMC.
400 =begin PIR_FRAGMENT_INVALID
403 # ... schedule the task, etc.
406 =end PIR_FRAGMENT_INVALID
408 Wait for a particular task to complete.
412 =begin PIR_FRAGMENT_INVALID
415 # ... schedule the task, etc.
418 =end PIR_FRAGMENT_INVALID
420 Kill a task without waiting for it to complete.
427 Dec 2003 - (Dan ponders threads based on POSIX and Perl 5 experience)
428 L<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/e64b22ab7de0a7a6/889b5d8c4cd267b7?lnk=gst&q=threads&rnum=3#889b5d8c4cd267b7>
430 Dec. 2003 - "threads and shared interpreter data structures"
431 L<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/e64ea4ff287e04fd/b71333e282d3d187?lnk=gst&q=threads&rnum=9#b71333e282d3d187>
433 Jan. 2004 - "Threads Design. A Win32 perspective."
434 L<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/3209629b23306029/52ba9d37425ba015?lnk=gst&q=threads&rnum=8#52ba9d37425ba015>
436 Jan. 2004 - "Start of threads proposal"
437 L<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/4c7de440da84d5c6/04cfb70b0d81dfba?tvc=1&q=threads#04cfb70b0d81dfba>
439 Sept. 2005 - "consider using OS threads"
440 L<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/40b50e3aa9255f8e/036a87b5d2b5ed2c?lnk=gst&q=threads&rnum=2#036a87b5d2b5ed2c>
442 Aug. 2007 - "multi-threading a work in progress"
443 L<http://perlmonks.org/?node_id=636466>
445 Concurrency as Futures -
446 L<http://www.cincomsmalltalk.com/userblogs/mls/blogView?showComments=true&entry=3336838959>
448 Io language - L<http://www.iolanguage.com/about/>
450 Java memory and concurrency - L<http://www.cs.umd.edu/~pugh/java/memoryModel/>
458 vim: expandtab shiftwidth=4: