2 RELEASE NOTES Ragel 4.X
5 To-State and From-State Action Embedding Operators Added (4.2)
6 ==============================================================
8 Added operators for embedding actions into all transitions into a state and all
9 transitions out of a state. These embeddings stay with the state, and are
10 irrespective of what the current transitions are and any future transitions
11 that may be added into or out of the state.
13 In the following example act is executed on the transitions for 't' and 'y'.
14 Even though it is only embedded in the context of the first alternative. This
15 is because after matching 'hi ', the machine has not yet distinguished beween
16 the two threads. The machine is simultaneously in the state expecting 'there'
17 and the state expecting 'you'.
24 The to-state action embedding operators embed into transitions that go into:
28 <~ states that are not the start
29 @~ states that are not final
30 <@~ states that are not the start AND not final
32 The from-state action embedding operators embed into transitions that leave:
36 <* states that are not the start
37 @* states that are not final
38 <@* states that are not the start AND not final
40 Changed Operators for Embedding Context/Actions Into States (4.2)
41 =================================================================
43 The operators used to embed context and actions into states have been modified.
44 The purpose of the modification is to make it easier to distribute actions to
45 take among the states in a chain of concatenations such that each state has
46 only a single action embedded. An example follows below.
50 1. The use of >@ for selecting the states to modfiy (as in >@/ to embed eof
51 actions, etc) has been removed. This prefix meant start state OR not start AND
54 2. The use of @% for selecting states to modify (as in @%/ to embed eof
55 actions, etc) has been removed. This prefix previously meant not start AND not
60 1. The prefix < which means not start.
61 2. The prefix @ which means not final.
62 3. The prefix <@ which means not start & not final"
64 The new matrix of operators used to embed into states is:
66 >: $: %: <: @: <@: - context
67 >~ $~ %~ <~ @~ <@~ - to state action
68 >* $* %* <* @* <@* - from state action
69 >/ $/ %/ </ @/ <@/ - eof action
70 >! $! %! <! @! <@! - error action
71 >^ $^ %^ <^ @^ <@^ - local error action
74 | | | | | *- not start & not final
86 This example shows one way to use the new operators to cover all the states
87 with a single action. The embedding of eof2 covers all the states in m2. The
88 embeddings of eof1 and eof3 avoid the boundaries that m1 and m3 both share with
98 main := m1 @/eof1 . m2 $/eof2 . m3 </eof3;
100 Verbose Action, Priority and Context Embedding Added (4.2)
101 ==========================================================
103 As an alternative to the symbol-based action, priority and context embedding
104 operators, a more verbose form of embedding has been added. The general form of
105 the verbose embedding is:
107 machine <- location [modifier] embedding_type value
109 For embeddings into transitions, the possible locations are:
110 enter -- entering transitions
111 all -- all transitions
112 finish -- transitions into a final state
113 leave -- pending transitions out of the final states
115 For embeddings into states, the possible locations are:
116 start -- the start state
118 final -- final states
119 !start -- all states except the start
120 !final -- states that are not final
121 !start !final -- states that are not the start and not final
123 The embedding types are:
124 exec -- an action into transitions
125 pri -- a priority into transitions
126 ctx -- a named context into a state
127 into -- an action into all transitions into a state
128 from -- an action into all transitions out of a state
129 err -- an error action into a state
130 lerr -- a local error action into a state
132 The possible modfiers:
133 on name -- specify a name for priority and local error embedding
135 Character-Level Negation '^' Added (4.1)
136 ========================================
138 A character-level negation operator ^ was added. This operator has the same
139 precedence level as !. It is used to match single characters that are not
140 matched by the machine it operates on. The expression ^m is equivalent to
141 (any-(m)). This machine makes sense only when applied to machines that match
142 single characters. Since subtraction is essentially a set difference, any
143 strings matched by m that are not of length 1 will be ignored by the
144 subtraction and have no effect.
146 Discontinued Plus Sign To Specifify Positive Literal Numbers (4.1)
147 ==================================================================
149 The use of + to specify a literal number as positive has been removed. This
150 notation is redundant because all literals are positive by default. It was
151 unlikely to be used but was provided for consistency. This notation caused an
152 ambiguity with the '+' repetition operator. Due to this ambibuity, and the fact
153 that it is unlikely to be used and is completely unnecessary when it is, it has
154 been removed. This simplifies the design. It elimnates possible confusion and
155 removes the need to explain why the ambiguity exists and how it is resolved.
157 As a consequence of the removal, any expression (m +1) or (m+1) will now be
158 parsed as (m+ . 1) rather then (m . +1). This is because previously the scanner
159 handled positive literals and therefore they got precedence over the repetition
162 Precedence of Subtraction Operator vs Negative Literals Changed (4.1)
163 =====================================================================
165 Previously, the scanner located negative numbers and therefore gave a higher
166 priority to the use of - to specify a negative literal number. This has
167 changed, precedence is now given to the subtraction operator.
169 This change is for two reasons: A) The subtraction operator is far more common
170 than negative literal numbers. I have quite often been fooled by writing
171 (any-0) and having it parsed as ( any . -0 ) rather than ( any - 0 ) as I
172 wanted. B) In the definition of concatentation I want to maintain that
173 concatenation is used only when there are no other binary operators separating
174 two machines. In the case of (any-0) there is an operator separating the
175 machines and parsing this as the concatenation of (any . -0) violates this
178 Duplicate Actions are Removed From Action Lists (4.1)
179 =====================================================
181 With previous versions of Ragel, effort was often expended towards ensuring
182 identical machines were not uniononed together, causing duplicate actions to
183 appear in the same action list (transition or eof perhaps). Often this required
184 factoring out a machine or specializing a machine's purpose. For example,
185 consider the following machine:
187 word = [a-z]+ >s $a %l;
192 This machine needed to be rewritten as the following to avoid duplicate
193 actions. This is essentially a refactoring of the machine.
195 main := word ( ' ' | '\t' ) word;
197 An alternative was to specialize the machines:
199 word1 = [a-z]+ >s $a %l;
202 ( word1 ' ' word1 ) |
203 ( word2 '\t' word1 );
205 Since duplicating an action on a transition is never (in my experience) desired
206 and must be manually avoided, sometimes to the point of obscuring the machine
207 specification, it is now done automatically by Ragel. This change should have
208 no effect on existing code that is properly written and will allow the
209 programmer more freedom when writing new code.
214 The syntax for embedding Ragel statements into the host language has changed.
215 The primary motivation is a better interaction with Objective-C. Under the
216 previous scheme Ragel generated the opening and closing of the structure and
217 the interface. The user could inject user defined declarations into the struct
218 using the struct {}; statement, however there was no way to inject interface
219 declarations. Under this scheme it was also awkward to give the machine a base
220 class. Rather then add another statement similar to struct for including
221 declarations in the interface we take the reverse approach, the user now writes
222 the struct and interface and Ragel statements are injected as needed.
224 Machine specifications now begin with %% and are followed with an optional name
225 and either a single ragel statement or a sequence of statements enclosed in {}.
226 If a machine specification does not have a name then Ragel tries to find a name
227 for it by first checking if the specification is inside a struct or class or
228 interface. If it is not then it uses the name of the previous machine
229 specification. If still no name is found then an error is raised.
231 Since the user now specifies the fsm struct directly and since the current
232 state and stack variables are now of type integer in all code styles, it is
233 more appropriate for the user to manage the declarations of these variables.
234 Ragel no longer generates the current state and the stack data variables. This
235 also gives the user more freedom in deciding how the stack is to be allocated,
236 and also permits it to be grown as necessary, rather than allowing only a fixed
239 FSM specifications now persist in memory, so the second time a specification of
240 any particular name is seen the statements will be added to the previous
241 specification. Due to this it is no longer necessary to give the element or
242 alphabet type in the header portion and in the code portion. In addition there
243 is now an include statement that allows the inclusion of the header portion of
244 a machine it it resides in a different file, as well as allowing the inclusion
245 of a machine spec of a different name from the any file at all.
247 Ragel is still able to generate the machine's function declarations. This may
248 not be required for C code, however this will be necessary for C++ and
249 Objective-C code. This is now accomplished with the interface statement.
251 Ragel now has different criteria for deciding what to generate. If the spec
252 contains the interface statement then the machine's interface is generated. If
253 the spec contains the definition of a main machine, then the code is generated.
254 It is now possible to put common machine definitions into a separate library
255 file and to include them in other machine specifications.
257 To port Ragel 3.x programs to 4.x, the FSM's structure must be explicitly coded
258 in the host language and it must include the declaration of current state. This
259 should be called 'curs' and be of type int. If the machine uses the fcall
260 and fret directives, the structure must also include the stack variables. The
261 stack should be named 'stack' and be of type int*. The stack top should be
262 named 'top' and be of type int.
264 In Objective-C, the both the interface and implementation directives must also
265 be explicitly coded by the user. Examples can be found in the section "New
268 Action and Priority Embedding Operators (4.0)
269 =============================================
271 In the interest of simplifying the language, operators now embed strictly
272 either on characters or on EOF, but never both. Operators should be doing one
273 well-defined thing, rather than have multiple effects. This also enables the
274 detection of FSM commands that do not make sense in EOF actions.
276 This change is summarized by:
277 -'%' operator embeds only into leaving characters.
278 -All global and local error operators only embed on error character
279 transitions, their action will not be triggerend on EOF in non-final states.
280 -Addition of EOF action embedding operators for all classes of states to make
281 up for functionality removed from other operators. These are >/ $/ @/ %/.
282 -Start transition operator '>' does not imply leaving transtions when start
285 This change results in a simpler and more direct relationship between the
286 operators and the physical state machine entities they operate on. It removes
287 the special cases within the operators that require you to stop and think as
288 you program in Ragel.
290 Previously, the pending out transition operator % simultaneously served two
291 purposes. First, to embed actions to that are to get transfered to transitions
292 made going out of the machine. These transitions are created by the
293 concatentation and kleene star operators. Second, to specify actions that get
294 executed on EOF should the final state in the machine to which the operator is
295 applied remain final.
297 To convert Ragel 3.x programs: Any place where there is an embedding of an
298 action into pending out transitions using the % operator and the final states
299 remain final in the end result machine, add an embedding of the same action
300 using the EOF operator %/action.
302 Also note that when generating dot file output of a specific component of a
303 machine that has leaving transitions embedded in the final states, these
304 transitions will no longer show up since leaving transtion operator no longer
305 causes actions to be moved into the the EOF event when the state they are
306 embeeded into becomes a final state of the final machine.
308 Const Element Type (4.0)
309 ========================
311 If the element type has not been defined, the previous behaviour was to default
312 to the alphabet type. The element type however is usually not specified as
313 const and in most cases the data pointer in the machine's execute function
314 should be a const pointer. Therefore ragel now makes the element type default
315 to a constant version of the alphabet type. This can always be changed by using
316 the element statment. For example 'element char;' will result in a non-const
319 New Interface Examples (4.0)
320 ============================
322 ---------- C ----------
331 main := 'hello world';
334 --------- C++ ---------
342 %% main := 'hello world';
344 ----- Objective-C -----
346 @interface Clang : Object
356 @implementation Clang
358 %% main := 'hello world';