4 Why3 can be used to verify programs written in languages other than
5 WhyML. Currently, Why3 supports tiny subsets of C and Python,
6 respectively coined micro-C and micro-Python below. These were
7 designed for teaching purposes. They come with their own specification
8 languages, written in special comments.
9 These input formats are described below.
11 Why3 also supports the format MLCFG, which is an extension of WhyML
12 allowing to define function bodies in an unstructured style, with goto
15 Why3 also supports the format Coma (which stands for Continuation Machine).
16 Coma is an Intermediate Verification Language implemented as a plugin in the
19 Any Why3 tool (:why3:tool:`why3 prove`, :why3:tool:`why3 ide`, etc.)
20 can be passed a file with a suffix :file:`.c`, :file:`.py`, :file:`.coma` or
21 :file:`.mlcfg`, which triggers the corresponding input format.
29 Micro-C is a valid subset of ANSI C. Hence, micro-C files can be
30 passed to a C compiler.
35 Logical annotations are inserted in special comments starting
36 with ``//@`` or ``/*@``. In the following grammar, we
37 only use the former kind, for simplicity, but both kinds are allowed.
39 .. productionlist:: micro-C
41 decl: `c_include` | `c_function` | `logic_declaration`
42 c_include: "#include" "<" file-name ">"
44 Directives ``#include`` are ignored during the translation to
45 Why3. They are allowed anyway, such that a C source code using
46 functions such as ``printf`` (see below) is accepted by a C compiler.
48 .. rubric:: Function definition
50 .. productionlist:: micro-C
51 c_function: `return_type` identifier "(" `params`? ")" `spec`* `block`
52 return_type: "void" | "int"
53 params: `param` ("," `param`)*
54 param: "int" identifier | "int" identifier "[]"
56 .. rubric:: Function specification
58 .. productionlist:: micro-C
59 spec: "//@" "requires" `term` ";"
60 : | "//@" "ensures" `term` ";"
61 : | "//@" "variant" `term` ("," `term`)* ";"
63 .. rubric:: C expression
65 .. productionlist:: micro-C
66 expr: integer-literal | string-literal
67 : | identifier | identifier ( "++" | "--" ) | ( "++" | "--" ) identifier
68 : | identifier "[" `expr` "]"
69 : | identifier "[" `expr` "]" ( "++" | "--")
70 : | ( "++" | "--") identifier "[" `expr` "]"
71 : | "-" `expr` | "!" `expr`
72 : | `expr` ( "+" | "-" | "*" | "/" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) `expr`
73 : | identifier "(" (`expr` ("," `expr`)*)? ")"
74 : | "scanf" "(" '"%d"' "," "&" identifier ")"
77 .. rubric:: C statement
79 .. productionlist:: micro-C
81 : | "return" `expr` ";"
82 : | "int" identifier ";"
83 : | "int" identifier "[" `expr` "]" ";"
85 : | "if" "(" `expr` ")" `stmt`
86 : | "if" "(" `expr` ")" `stmt` "else" `stmt`
87 : | "while" "(" `expr` ")" `loop_body`
88 : | "for" "(" `expr_stmt` ";" `expr` ";" `expr_stmt` ")" `loop_body`
91 : | "//@" "label" identifier ";"
92 : | "//@" ( "assert" | "assume" | "check" ) `term` ";"
93 block: "{" `stmt`* "}"
94 expr_stmt: "int" identifier "=" `expr`
95 : | identifier `assignop` `expr`
96 : | identifier "[" `expr` "]" `assignop` `expr`
98 assignop: "=" | "+=" | "-=" | "*=" | "/="
99 loop_body: `loop_annot`* `stmt`
100 : | "{" `loop_annot`* `stmt`* "}"
101 loop_annot: "//@" "invariant" `term` ";"
102 : | "//@" "variant" `term` ("," `term`)* ";"
104 Note that the syntax for loop bodies allows the loop annotations to be
105 placed either before the block or right at the beginning of the block.
107 .. rubric:: Logic declarations
109 .. productionlist:: micro-C
110 logic_declaration: "//@" "function" "int" identifier "(" `params` ")" ";"
111 : | "//@" "function" "int" identifier "(" `params` ")" "=" `term` ";"
112 : | "//@" "predicate" identifier "(" `params` ")" ";"
113 : | "//@" "predicate" identifier "(" `params` ")" "=" `term` ";"
114 : | "//@" "axiom" identifier ":" `term` ";"
115 : | "//@" "lemma" identifier ":" `term` ";"
116 : | "//@" "goal" identifier ":" `term` ";"
118 Logic functions are limited to a return type ``int``.
120 .. rubric:: Logical term
122 .. productionlist:: micro-C
128 : | `term` "[" `term` "]"
129 : | `term` "[" `term` "<-" `term` "]"
131 : | "old" "(" `term` ")"
132 : | "at" "(" `term` "," identifier ")"
134 : | `term` ( "->" | "<->" | "||" | "&&" ) `term`
135 : | `term` ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) `term`
136 : | `term` ( "+" | "-" | "*" | "/" | "% ) `term`
137 : | "if" `term` "then" `term` "else `term`
138 : | "let" identifier "=" `term` "in" `term`
139 : | ( "forall" | "exists" ) `binder` ("," `binder`)* "." `term`
140 : | identifier "(" (`term` ("," `term`)*)? ")"
144 Built-in functions and predicates
145 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149 * ``scanf`` is limited to the syntax ``scanf("%d", &x)``.
150 * ``printf`` is limited to ``printf(string-literal,
151 expr1, ..., exprn)``. The string literal should
152 contain exactly ``n`` occurrences of ``%d`` (not checked by Why3).
153 * ``rand()`` returns a pseudo-random integer in the range ``0`` to
154 ``RAND_MAX`` inclusive.
158 * ``int length(int a[])`` returns the length of array ``a``.
159 * ``int occurrence(int v, int a[])`` returns the number of occurrences of the
160 value ``v`` in array ``a``.
164 .. _format.micro-Python:
169 Micro-Python is a valid subset of Python 3. Hence, micro-Python files can be
170 passed to a Python interpreter.
172 Syntax of micro-Python
173 ~~~~~~~~~~~~~~~~~~~~~~
175 Notation: In the grammar of micro-Python given below,
176 special symbols ``NEWLINE``, ``INDENT``,
177 and ``DEDENT`` mark an end of line, the beginning of a new
178 indentation block, and its end, respectively.
180 Logical annotations are inserted in special comments starting with ``#@``.
182 .. productionlist:: microPython
184 decl: `py_import` | `py_constant` | `py_function` | `stmt` | `logic_declaration`
185 py_import: "from" identifier "import" identifier ("," identifier)* NEWLINE
187 Directives ``import`` are ignored during the translation to
188 Why3. They are allowed anyway, such that a Python source code using
189 functions such as ``randint`` is accepted by a Python
190 interpreter (see below).
192 .. rubric:: Constant definition
194 .. productionlist:: microPython
195 py_constant: "#@" "constant" NEWLINE identifier "=" `expr` NEWLINE
197 .. rubric:: Function definition
199 .. productionlist:: microPython
200 py_function: `logic_def`? "def" identifier "(" `params`? ")" `return_type`? ":" NEWLINE INDENT `spec`* `stmt`* DEDENT
201 params: `param` ("," `param`)*
202 param: identifier (":" `py_type`)?
203 return_type: "->" `py_type`
204 py_type: identifier ("[" `py_type` ("," `py_type`)* "]")?
207 .. rubric:: Function specification
209 .. productionlist:: microPython
210 logic_def: "#@" "function" NEWLINE
211 spec: "#@" "requires" `term` NEWLINE
212 : | "#@" "ensures" `term` NEWLINE
213 : | "#@" "variant" `term` ("," `term`)* NEWLINE
215 .. rubric:: Python expression
217 .. productionlist:: microPython
218 expr: "None" | "True" | "False" | integer-literal | string-literal
220 : | identifier "[" `expr` "]"
221 : | "-" `expr` | "not" `expr`
222 : | `expr` ( "+" | "-" | "*" | "//" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "and" | "or" ) `expr`
223 : | identifier "(" (`expr` ("," `expr`)*)? ")"
224 : | `expr` ("," `expr`)+
225 : | "[" (`expr` ("," `expr`)*)? "]"
226 : | `expr` "if" `expr` "else" `expr`
229 .. rubric:: Python statement
231 .. productionlist:: microPython
232 stmt: `simple_stmt` NEWLINE
233 : | "if" `expr` ":" `suite` `else_branch`
234 : | "while" `expr` ":" `loop_body`
235 : | "for" identifier "in" `expr` ":" `loop_body`
236 else_branch: /* nothing */
238 : | "elif" `expr` ":" `suite` `else_branch`
239 suite: `simple_stmt` NEWLINE
240 : | NEWLINE INDENT `stmt` `stmt`* DEDENT
243 : | identifier "=" `expr`
244 : | identifier "[" `expr` "]" "=" `expr`
246 : | "#@" "label" identifier
247 : | "#@" ( "assert" | "assume" | "check" ) `term`
248 loop_body: `simple_stmt` NEWLINE
249 : | NEWLINE INDENT `loop_annot`* `stmt` `stmt`* DEDENT
250 loop_annot: "#@" "invariant" `term` NEWLINE
251 : | "#@" "variant" `term` ("," `term`)* NEWLINE
253 .. rubric:: Logic declaration
255 .. productionlist:: microPython
256 logic_declaration: "#@" "function" identifier "(" `params` ")" `return_type`? ("variant" "{" `term` "}")? ("=" `term`)? NEWLINE
257 : | "#@" "predicate" identifier "(" `params` ")" ("=" `term`)? NEWLINE
258 : | "#@" "axiom" identifier ":" `term` NEWLINE
259 : | "#@" "lemma" identifier ":" `term` NEWLINE
261 Note that logic functions and predicates cannot be given definitions.
262 Yet, they can be axiomatized, using toplevel ``assume`` statements.
265 .. rubric:: Logical term
267 .. productionlist:: microPython
274 : | `term` "[" `term` "]"
275 : | `term` "[" `term` "<-" `term` "]"
277 : | "old" "(" `term` ")"
278 : | "at" "(" `term` "," identifier ")"
280 : | `term` ( "->" | "<->" | "or" | "and" | "by" | "so" ) `term`
281 : | `term` ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) `term`
282 : | `term` ( "+" | "-" | "*" | "//" | "%" ) `term`
283 : | "if" `term` "then" `term` "else `term`
284 : | "let" identifier "=" `term` "in" `term`
285 : | ( "forall" | "exists" ) param ("," param)* "." `term`
286 : | identifier "(" (`term` ("," `term`)*)? ")"
288 Built-in functions and predicates
289 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
291 .. rubric:: Python code
293 * built-in function ``pow`` over integers
294 * ``len(l)`` returns the length of list ``l``.
295 * ``int(input())`` reads an integer from standard input.
296 * ``range(l, u)`` returns the list of integers
297 from ``l`` inclusive to ``u`` exclusive.
298 In particular, ``for x in range(l, u):`` is supported.
299 * ``randint(l, u)`` returns a pseudo-random integer
300 in the range ``l`` to ``u`` inclusive.
304 * ``len(l)`` returns the length of list ``l``.
305 * ``occurrence(v, l)`` returns the number of occurrences of the value ``v`` in list ``l``.
310 Python lists are modeled as arrays, whose size cannot be modified.
320 The MLCFG language is an experimental extension of the regular WhyML
321 language, in which the body of program functions can be
322 coded using labelled blocks and goto statements. MLCFG can be used to
323 encode algorithms which are presented in an unstructured fashion. It
324 can also be used as a target for encoding unstructured programming
325 languages in Why3, for example assembly code.
328 Syntax of the MLCFG language
329 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
331 The MLCFG syntax is an extension of the regular WhyML syntax. Every
332 WhyML declaration is allowed, plus an additional declaration of
333 program function of the following form, introduced by keywords ``let cfg``:
337 let cfg *f* (*x*:sub:`1`: *t*:sub:`1`) ... (*x*:sub:`n`: *t*:sub:`n`): *t*
341 var *y*:sub:`1`: *u*:sub:`1`;
343 var *y*:sub:`k`: *u*:sub:`k`;
344 { *instructions*; *terminator* }
345 *L*:sub:`1` { *instructions*:sub:`1`; *terminator*:sub:`1` }
347 *L*:sub:`j` { *instructions*:sub:`j`; *terminator*:sub:`j` }
350 It defines a program function *f*, with the usual syntax for
351 its contract. The difference is the body, which is made of a sequence
352 of declarations of mutable variables with their types, an initial block,
353 composed of a zero or more instructions followed by a terminator, and a
354 sequence of other blocks, each denoted by a label (:math:`L_1 \ldots L_j` above).
355 The instructions are semi-colon separated sequences of regular
356 WhyML expressions of type ``unit``, excluding ``return`` or ``absurd``
357 expressions, or code invariants:
359 - a code invariant: :samp:`invariant {I} \\{ {t} }` where *I* is a
360 name and *t* a predicate. It is similar to an assert expression,
361 meaning that *t* must hold when execution reaches this statement.
362 Additionally, it acts as a cut in the generation of VC, similarly
363 to a loop invariant. See example below.
365 Each block is ended by one of the following terminators:
367 - a ``goto`` statement: :samp:`goto {L}` where *L* is one of the labels of the
368 other blocks. It instructs to continue execution at the
371 - a ``switch`` statement, of the form
376 | *pat*:sub:`1` -> *terminator*:sub:`1`
378 | *pat*:sub:`k` -> *terminator*:sub:`k`
381 - a ``return`` statement: :samp:`return {expr}`
382 - an ``absurd`` statement: indicating that this block should be unreachable.
384 The extension of syntax is described by the following rules.
386 .. productionlist:: CFG
388 module: "module" `:ident` `decl`* "end"
389 decl: "let" "cfg" `cfg_fundef`
390 : | "let" "rec" "cfg" `cfg_fundef` ("with" `cfg_fundef`)*
391 : | "scope" `:ident` `decl`* "end"
392 cfg_fundef: `:ident` `:binder`+ : `:type` `:spec` "=" `vardecl`* "{" `block` "}" `labelblock`*
393 vardecl: "var" `:ident`* ":" `:type` ";" | "ghost" "var" `:ident`* ":" `:type` ";"
394 block: (`instruction` ";")* `terminator`
395 labelblock: `:ident` "{" `block` "}"
397 : | "invariant" `:ident` "{" `:term` "}"
402 : | "switch" "(" `:expr` ")" `switch_case`* "end"
403 switch_case: "|" `:pattern` "->" `terminator`
410 The following example is directly inspired from the documentation of
411 the ANSI C Specification Language (See :cite:`baudin18acsl`, Section
412 2.4.2 Loop invariants, Example 2.27). It is itself inspired from the
413 first example of Knuth's MIX language, for which formal proofs were
414 first investigated by J.-C. Filliâtre in 2007
415 (:cite:`filliatre07mix`), and also revisited by T.-M.-T. Nguyen in her
416 PhD thesis in 2012 (:cite:`nguyen12phd`, Section 9.5 Translation from
417 a CFG to Why, page 115).
419 This example aims at computing the maximum value of
420 an array of integers. Its code in C is given below.
424 /*@ requires n >= 0 && \valid(a,0,n);
425 @ ensures \forall integer j ; 0 <= j < n ==> \result >= a[j]);
427 int max_array(int a[], int n) {
431 if (a[i] > m) { L: m = a[i]; }
433 @ 0 <= i < n && \forall integer j ; 0 <= j <= i ==> m >= a[j]);
441 The code can be viewed as a control-flow graph as shown in :numref:`fig.cfg.max_array`.
443 .. graphviz:: images/max_array.dot
444 :caption: Control-flow graph of the ``max_array`` function.
445 :name: fig.cfg.max_array
447 Below is a version of this code in the MLCFG language, where label
448 ``L`` corresponds to node ``L``, label ``L1`` to node ``invariant``,
449 label ``L2`` to node ``do``.
451 .. code-block:: whyml
453 let cfg max_array (a:array int) : (max: int, ghost ind:int)
454 requires { length a > 0 }
455 ensures { 0 <= ind < length a }
456 ensures { forall j. 0 <= j < length a -> a[ind] >= a[j] }
470 invariant i_bounds { 0 <= i < length a };
471 invariant ind_bounds { 0 <= ind < length a };
472 invariant m_and_ind { m = a[ind] };
473 invariant m_is_max { forall j. 0 <= j <= i -> m >= a[j] };
474 (* yes, j <= i, not j < i ! *)
476 switch (i < length a)
478 | False -> return (m, ind)
488 The consecutive invariants act as a single cut in the generation of VCs.
494 The translation from the MLCFG language to regular WhyML code may raise
495 the following errors.
497 - “cycle without invariant”: in order to perform the translation, any
498 cycle on the control-flow graph must contain at least one
499 ``invariant`` clause. It corresponds to the idea that any loop must
500 contain a loop invariant.
502 - “cycle without invariant (starting from `I`)”: same error as
503 above, except that the cycle was not reachable from the start of the
504 function body, but from the other ``invariant`` clause named
507 - “label `L` not found for goto”: there is a ``goto`` instruction
508 to a non-existent label.
510 - “unreachable code after goto”: any code occurring after a ``goto``
511 statement is unreachable and is not allowed.
513 - “unsupported: trailing code after switch”: see limitations below.
519 - There is no way to prove termination.
521 - New keywords ``cfg``, ``goto``, ``switch``, and ``var`` cannot be used as
522 regular identifiers anymore.
524 - Trailing code after ``switch`` is not supported. In principle, it
525 should be possible to have a ``switch`` with type ``unit`` and to transfer
526 the execution to the instructions after the ``switch`` for branches
527 not containing ``goto``. This is not
528 yet supported. A workaround is to place the trailing instructions in
529 another block and pose an extra ``goto`` to this block in all the
530 branches that do not end with a ``goto``.
532 - Conditional statements ``if e then i1 else i2`` are not yet
533 supported, but can be simulated with ``switch (e) | True -> i1 |
536 Alternative translation scheme: Stackify
537 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
539 An alternative translation scheme from MLCFG to regular WhyML
540 can be triggered by putting the attribute :why3:attribute:`[@cfg:stackify]` on
541 a function. This method attempts to recover a more
542 structured program body, reconstructing loops when possible.
548 Additional invariants on the generated WhyML code can be inferred by
549 putting the attribute :why3:attribute:`[@cfg:subregion_analysis]` on a
550 function. These invariants are derived by a static analysis of the
551 subregions that are never modified in a loop.
554 .. index:: S-expressions
555 .. _format.S-expressions:
560 This format is intended to be used only as an intermediate language
561 format. Why3 accepts input files in the format of a big S-expression,
562 mirroring the internal structure of parse trees, as they are defined
563 in the module ``Ptree`` of the OCaml API. The S-expression in question
564 is supposed to be generated by the API itself, that is the function
565 ``Ptree.sexp_of_mlw_file``. It is in fact exactly the format that is
566 also printed using :option:`why3 pp --output` with ``--output=sexp``.
568 Notice that this input format is available only when Why3 is compiled
569 with S-expressions support provided by the ``ppx_sexp_conv`` OCaml
572 Concerning the use of Why3 as an intermediate language, via the
573 ``Ptree`` API, the use of S-expressions instead of the regular WhyML
574 syntax is recommended since the printing and re-parsing of parse trees
575 is automatically generated, hence the result is guaranteed to
576 faithfully mirror the internal Ptree representation.
584 The Coma documentation is described in https://coma-ivl.codeberg.page/.