1 (This is a consolidation of documentation written by stig, sahlberg, and gram)
3 What is the display filter system?
4 ==================================
5 The display filter system allows the user to select packets by testing
6 for values in the proto_tree that Wireshark constructs for that packet.
7 Every proto_item in the proto_tree has an 'abbrev' field
8 and a 'type' field, which tells the display filter engine the name
9 of the field and its type (what values it can hold).
11 For example, this is the definition of the ip.proto field from packet-ip.c:
14 { "Protocol", "ip.proto", FT_UINT8, BASE_DEC | BASE_EXT_STRING,
15 &ipproto_val_ext, 0x0, NULL, HFILL }},
17 This definition says that "ip.proto" is the display-filter name for
18 this field, and that its field-type is FT_UINT8.
20 The display filter system has 3 major parts to it:
22 1. A type system (field types, or "ftypes")
23 2. A parser, to convert a user's query to an internal representation
24 3. An engine that uses the internal representation to select packets.
28 epan/dfilter/* - the display filter engine, including
29 scanner, parser, syntax-tree semantics checker, DFVM bytecode
30 generator, and DFVM engine.
31 epan/ftypes/* - the definitions of the various FT_* field types.
32 epan/proto.c - proto_tree-related routines
37 The field type system is stored in epan/ftypes.
39 The proto_tree system #includes ftypes.h, which gives it the ftenum
40 definition, which is the enum of all possible ftypes:
44 FT_NONE, /* used for text labels with no value */
47 FT_CHAR, /* 1-octet character as 0-255 */
50 FT_UINT24, /* really a UINT32, but displayed as 6 hex-digits if FD_HEX*/
52 FT_UINT40, /* really a UINT64, but displayed as 10 hex-digits if FD_HEX*/
53 FT_UINT48, /* really a UINT64, but displayed as 12 hex-digits if FD_HEX*/
54 FT_UINT56, /* really a UINT64, but displayed as 14 hex-digits if FD_HEX*/
59 It also provides the definition of fvalue_t, the struct that holds the *value*
60 that corresponds to the type. Each proto_item (proto_node) holds an fvalue_t
61 due to having a field_info struct (defined in proto.h).
63 The fvalue_t is mostly just a gigantic union of possible C-language types
64 (as opposed to FT_* types):
66 typedef struct _fvalue_t {
69 /* Put a few basic types in here */
75 wmem_strbuf_t *strbuf;
77 ipv4_addr_and_mask ipv4;
78 ipv6_addr_and_prefix ipv6;
81 protocol_value_t protocol;
82 uint16_t sfloat_ieee_11073;
83 uint32_t float_ieee_11073;
90 The ftype system itself is designed to be modular, so that new field types
91 can be added when necessary.
93 Each field type must implement an ftype_t structure, defined in
94 ftypes-int.h. This is the way a field type is registered with the ftype engine.
96 If you take a look at ftype-integer.c, you will see that it provides
97 an ftype_register_integers() function, that fills in many such ftype_t
98 structs. It creates one for each integer type: FT_UINT8, FT_UINT16,
101 The ftype_t struct defines the things needed for the ftype:
104 * a string representation of the FT name ("FT_UINT8")
105 * how much data it consumes in the packet
106 * how to store that value in an fvalue_t: new(), free(),
107 various value-related functions
108 * how to compare that value against another
109 * how to slice that value (strings and byte ranges can be sliced)
113 Once the value of a field is stored in an fvalue_t (stored in
114 each proto_item via field_info), it's easy to use those values,
115 thanks to the various fvalue_*() functions defined in ftypes.h.
117 Functions like fvalue_get(), fvalue_eq(), etc., are all generic
118 interfaces to get information about the field's value. They work
119 on any field type because of the ftype_t struct, which is the lookup
120 table that the field-type engine uses to work with any field type.
122 The display filter parser
123 =========================
124 The display filter parser (along with the comparison engine)
125 is stored in epan/dfilter.
127 The scanner/parser pair read the string representing the display filter
128 and convert it into a very simple syntax tree. The syntax tree is very
129 simple in that it is possible that many of the nodes contain unparsed
130 chunks of text from the display filter.
132 There are four phases to parsing a user's request:
134 1. Scanning the string for dfilter syntax
135 2. Parsing the keywords according to the dfilter grammar, into a
137 3. Doing a semantic check of the nodes in that syntax tree
138 4. Converting the syntax tree into a series of DFVM byte codes
140 The dfilter_compile() function, in epan/dfilter/dfilter.c,
141 runs these 4 phases. The end result is a dfwork_t object (dfw), that
142 can be passed to dfilter_apply() to actually run the display filter
143 against a set of proto_trees.
146 Scanning the display filter string
147 ----------------------------------
148 epan/dfilter/scanner.l is the lex scanner for finding keywords
149 in the user's display filter string.
151 Its operation is simple. It finds the special function and comparison
152 operators ("==", "!=", "eq", "ne", etc.), it finds slice operations
153 ( "[0:1]" ), quoted strings, IP addresses, numbers, and any other "special"
154 keywords or string types.
156 Anything it doesn't know how to handle is passed to the grammar parser
157 as an unparsed string (TOKEN_UNPARSED). This includes field names. The
158 scanner does not interpret any protocol field names at all.
160 The scanner has to return a token type (TOKEN_*, and in many cases,
161 a value. The value will be an stnode_t struct, which is a syntax
162 tree node object. Since the final storage of the parse will
163 be in a syntax tree, it is convenient for the scanner to fill in
164 syntax tree nodes with values when it can.
166 The stnode_t definition is in epan/dfilter/syntax-tree.h
169 Parsing the keywords according to the dfilter grammar
170 -----------------------------------------------------
171 The grammar parser is implemented with the 'lemon' tool,
172 rather than the traditional yacc or bison grammar parser,
173 as lemon grammars were found to be easier to work with. The
174 lemon parser specification (epan/dfilter/grammar.lemon) is
175 much easier to read than its bison counterpart would be,
176 thanks to lemon's feature of being able to name fields, rather
177 then using numbers ($1, $2, etc.)
179 The lemon tool is located in tools/lemon in the Wireshark
182 An on-line introduction to lemon is available at:
184 http://www.sqlite.org/src/doc/trunk/doc/lemon.html
186 The grammar specifies which type of constructs are possible
187 within the dfilter language ("dfilter-lang")
189 An "expression" in dfilter-lang can be a relational test or a logical test.
191 A relational test compares a value against another, which is usually
192 a field (or a slice of a field) against some static value, like:
195 eth.dst != ff:ff:ff:ff:ff:ff
197 A logical test combines other expressions with "and", "or", and "not".
199 At the end of the grammatical parsing, the dfw object will
200 have a valid syntax tree, pointed at by dfw->st_root.
202 If there is an error in the syntax, the parser will call dfilter_fail()
203 with an appropriate error message, which the UI will need to report
206 The syntax tree system
207 ----------------------
208 The syntax tree is created as a result of running the lemon-based
209 grammar parser on the scanned tokens. The syntax tree code
210 is in epan/dfilter/syntax-tree* and epan/dfilter/sttype-*. It too
211 uses a set of code modules that implement different syntax node types,
212 similar to how the field-type system registers a set of ftypes
213 with a central engine.
215 Each node (stnode_t) in the syntax tree has a type (sttype).
216 These sttypes are very much related to ftypes (field types), but there
217 is not a one-to-one correspondence. The syntax tree nodes are slightly
218 higher-level abstractions. The root node of the syntax tree is the main
219 test or comparison being done.
223 After the parsing is done and a syntax tree is available, the
224 code in semcheck.c does a semantic check of what is in the syntax
227 The semantics of the simple syntax tree are checked to make sure that
228 the fields that are being compared are being compared to appropriate
229 values. For example, if a field is an integer, it can't be compared to
230 a string, unless a value_string has been defined for that field.
232 During the process of checking the semantics, the simple syntax tree is
233 fleshed out and no longer contains nodes with unparsed information. The
234 syntax tree is no longer in its simple form, but in its complete form.
236 For example, if the dfilter is slicing a field and comparing
237 against a set of bytes, semcheck.c has to check that the field
238 in question can indeed be sliced.
240 Or, can a field be compared against a certain type of value (string,
241 integer, float, IPv4 address, etc.)
243 The semcheck code also makes adjustments to the syntax tree
244 when it needs to. The parser sometimes stores raw, unparsed strings
245 in the syntax tree, and semcheck has to convert them to
246 certain types. For example, the display filter may contain
247 a value_string string (the "enum" type that protocols can use
248 to define the possible textual descriptions of numeric fields), and
249 semcheck will convert that value_string string into the correct
252 Truth be told, the semcheck.c code is a bit disorganized, and could
253 be re-designed & re-written.
257 The syntax tree is analyzed to create a sequence of bytecodes in the
258 "DFVM" language. "DFVM" stands for Display Filter Virtual Machine. The
259 DFVM is similar in spirit, but not in definition, to the BPF VM that
260 libpcap uses to analyze packets.
262 A virtual bytecode is created and used so that the actual process of
263 filtering packets will be fast. That is, it should be faster to process
264 a list of VM bytecodes than to attempt to filter packets directly from
265 the syntax tree. (heh... no measurement has been made to support this
268 The DFVM opcodes are defined in epan/dfilter/dfvm.h (dfvm_opcode_t).
269 Similar to how the BPF opcode system works in libpcap, there is a
270 limited set of opcodes. They operate by loading values from the
271 proto_tree into registers, loading pre-defined values into
272 registers, and comparing them. The opcodes are checked in sequence, and
273 there are only 2 branching opcodes: IF_TRUE_GOTO and IF_FALSE_GOTO.
274 Both of these can only branch forwards, and never backwards. In this way
275 sets of DFVM instructions will never get into an infinite loop.
277 The epan/dfilter/gencode.c code converts the syntax tree
278 into a set of dfvm instructions.
280 The constants that are in the DFVM instructions (the constant
281 values that the user is checking against) are pre-loaded
282 into registers via the dfvm_init_const() call, and stored
283 in the dfilter_t structure for when the display filter is
289 Once the DFVM bytecode has been produced, it's a simple matter of
290 running the DFVM engine against the proto_tree from the packet
291 dissection, using the DFVM bytecodes as instructions. If the DFVM
292 bytecode is known before packet dissection occurs, the
293 proto_tree-related code can be "primed" to store away pointers to
294 field_info structures that are interesting to the display filter. This
295 makes lookup of those field_info structures during the filtering process
298 The dfilter_apply() function runs a single pre-compiled
299 display filter against a single proto_tree function, and returns
300 true or false, meaning that the filter matched or not.
302 That function calls dfvm_apply(), which runs across the DFVM
303 instructions, loading protocol field values into DFVM registers
304 and doing the comparisons.
306 There is a top-level Makefile target called 'dftest' which
307 builds a 'dftest' executable that will print out the DFVM
308 bytecode for any display filter given on the command-line.
313 To use it, give it the display filter on the command-line:
315 $ ./dftest 'ip.addr == 127.0.0.1'
316 Filter: ip.addr == 127.0.0.1
319 00000 PUT_FVALUE 127.0.0.1 <FT_IPv4> -> reg#1
322 00000 READ_TREE ip.addr -> reg#0
323 00001 IF-FALSE-GOTO 3
324 00002 ANY_EQ reg#0 == reg#1
328 The output shows the original display filter, then the opcodes
329 that put constant values into registers. The registers are
330 numbered, and are shown in the output as "reg#n", where 'n' is the
333 Then the instructions are shown. These are the instructions
334 which are run for each proto_tree.
336 This is what happens in this example:
338 00000 READ_TREE ip.addr -> reg#0
340 Any ip.addr fields in the proto_tree are loaded into register 0. Yes,
341 multiple values can be loaded into a single register. As a result
342 of this READ_TREE, the accumulator will hold true or false, indicating
343 if any field's value was loaded, or not.
345 00001 IF-FALSE-GOTO 3
347 If the load failed because there were no ip.addr fields
348 in the proto_tree, then we jump to instruction 3.
350 00002 ANY_EQ reg#0 == reg#1
352 This checks to see if any of the fields in register 1
353 (which has the pre-loaded constant value of 127.0.0.1) are equal
354 to any of the fields in register 0 (which are all of the ip.addr
355 fields in the proto tree). The resulting value in the
356 accumulator will be true if any of the fields match, or false
361 This returns the accumulator's value, either true or false.
363 In addition to dftest, there is also a unit-test script for the
364 display filter engine - test/suite_dfilter/dfiltertest.py.
365 It makes use of tshark to run specific display filters against
366 specific captures in test/captures. See the "Wireshark Tests" chapter
367 in the Wireshark Developer’s Guide.
371 Display Filter Functions
372 ========================
373 You define a display filter function by adding an entry to
374 the df_functions table in epan/dfilter/dfunctions.c. The record struct
375 is defined in dfunctions.h, and shown here:
380 ftenum_t retval_ftype;
383 DFSemCheckType semcheck_param_function;
386 name - the name of the function; this is how the user will call your
387 function in the display filter language
389 function - this is the run-time processing of your function.
391 retval_ftype - what type of FT_* type does your function return?
393 min_nargs - minimum number of arguments your function accepts
394 max_nargs - maximum number of arguments your function accepts
396 semcheck_param_function - called during the semantic check of the
397 display filter string.
401 typedef bool (*DFFuncType)(GList *arg1list, GList *arg2list, GList **retval);
403 The return value of your function is a bool; true if processing went fine,
404 or false if there was some sort of exception.
406 For now, display filter functions can accept a maximum of 2 arguments.
407 The "arg1list" parameter is the GList for the first argument. The
408 'arg2list" parameter is the GList for the second argument. All arguments
409 to display filter functions are lists. This is because in the display
410 filter language a protocol field may have multiple instances. For example,
411 a field like "ip.addr" will exist more than once in a single frame. So
412 when the user invokes this display filter:
414 somefunc(ip.addr) == true
416 even though "ip.addr" is a single argument, the "somefunc" function will
417 receive a GList of *all* the values of "ip.addr" in the frame.
419 Similarly, the return value of the function needs to be a GList, since all
420 values in the display filter language are lists. The GList** retval argument
421 is passed to your function so you can set the pointer to your return value.
425 typedef void (*DFSemCheckType)(dfwork_t *dfw, int param_num, stnode_t *st_node);
427 For each parameter in the syntax tree, this function will be called.
428 "param_num" will indicate the number of the parameter, starting with 0.
429 The "stnode_t" is the syntax-tree node representing that parameter.
430 If everything is okay with the value of that stnode_t, your function
431 does nothing --- it merely returns. If something is wrong, however,
432 it should call dfilter_fail(dfw,...) and THROW a TypeError exception.
435 Example: add an 'in' display filter operation
436 =============================================
438 This example has been discussed on ethereal-dev in April 2004.
439 [Ethereal-dev] Need for an 'in' dfilter operator?
440 (https://lists.wireshark.org/archives/ethereal-dev/200404/msg00372.html)
441 It illustrates how a more complex operation can be added to the display filter language.
445 If I want to add an 'in' display filter operation, I need to define
446 several things. This can happen in different ways. For instance,
447 every value from the "in" value collection will result in a test.
448 There are 2 options here, either a test for a single value:
452 or a test for a value in a given range:
456 or even a combination of both. The former example can be reduced to:
458 ((x == a) or (x == b) or (x == c))
460 while the latter can be reduced to
462 ((x >= MIN(a, z)) and (x <= MAX(a, z)))
464 I understand that I can replace "x in {" with the following steps:
465 first store x in the "in" test buffer, then add "(" to the display
466 filter expression internally.
468 Similarly I can replace the closing brace "}" with the following
469 steps: release x from the "in" test buffer and then add ")"
470 to the display filter expression internally.
476 This could be done in grammar.lemon. The grammar would produce
477 syntax tree nodes, combining them with "or", when it is given
478 tokens that represent the "in" syntax.
480 It could also be done later in the process, maybe in
481 semcheck.c. But if you can do it earlier, in grammar.lemon,
482 then you shouldn't have to worry about modifying anything in
483 semcheck.c, as the syntax tree that is passed to semcheck.c
484 won't contain any new type of operators... just lots of nodes
487 How to add an operator FOO to the display filter language?
488 ==========================================================
490 Go to wireshark/epan/dfilter/
492 Edit grammar.lemon and add the operator. Add the operator FOO and the
493 test logic (defining TEST_OP_FOO).
495 Edit scanner.l and add the operator name(s) hence defining
496 TOKEN_TEST_FOO. Also update the simple() or add the new operand's code.
498 Edit sttype-test.h and add the TEST_OP_FOO to the list of test operations.
500 Edit sttype-test.c and add TEST_OP_FOO to the num_operands() method.
502 Edit gencode.c, add TEST_OP_FOO in the gen_test() method by defining
505 Edit dfvm.h and add ANY_FOO to the enum dfvm_opcode_t structure.
507 Edit dfvm.c and add ANY_FOO to dfvm_dump() (for the dftest display filter
508 test binary), to dfvm_apply() hence defining the methods fvalue_foo().
510 Edit semcheck.c and look at the check_relation_XXX() methods if they
511 still apply to the foo operator; if not, amend the code. Start from the
512 check_test() method to discover the logic.
514 Go to wireshark/epan/ftypes/
516 Edit ftypes.h and declare the fvalue_foo(), ftype_can_foo() and
517 fvalue_foo() methods. Add the cmp_foo() method to the struct _ftype_t.
519 This is the first time that a make in wireshark/epan/dfilter/ can
520 succeed. If it fails, then some code in the previously edited files must
523 Edit ftypes.c and define the fvalue_foo() method with its associated
524 logic. Define also the ftype_can_foo() and fvalue_foo() methods.
526 Edit all ftype-*.c files and add the required fvalue_foo() methods.
528 This is the point where you should be able to compile without errors in
529 wireshark/epan/ftypes/. If not, first fix the errors.
531 Go to wireshark/epan/ and run make. If this one succeeds, then we're
532 almost done as no errors should occur here.
534 Go to wireshark/ and run make. One thing to do is make dftest and see
535 if you can construct valid display filters with your new operator. Or
536 you may want to move directly to the generation of Wireshark.
538 Also look at ui/qt/display_filter_expression_dialog.cpp and the display
539 filter expression generator.
541 How to add a new test to the test suite
542 =======================================
544 All display filter tests are located in test/suite_dfilter.
545 You can add a test to an existing file or create a new file.
547 Each new test class must define "trace_file", which names
548 a capture file in "test/captures". All the tests
549 run in that class will use that one capture file.
551 There are 2 fixtures you can use for testing:
553 checkDFilterCount(dfilter, expected_count)
555 This will run the display filter through tshark, on the
556 file named by "trace_file", and assert that the
557 number of resulting packets equals "expected_count". This
558 also asserts that tshark does not fail; success with zero
559 matches is not the same as failure to compile the display
562 checkDFilterFail(dfilter, error)
564 This will run dftest with the display filter, and check
565 that it fails with a given error message. This is useful
566 when expecting display filter syntax errors to be caught.
570 # Run all dfilter tests
571 $ test/test.py suite_dfilter
573 # Run all tests from group_tvb.py:
574 $ test/test.py suite_dfilter.group_tvb
576 # For faster, parallel tests, install the "pytest-xdist" first
577 # (for example, using "pip install pytest-xdist"), then:
578 $ pytest -nauto test -k suite_dfilter
580 # Run all tests from group_tvb.py, in parallel:
581 $ pytest -nauto test -k case_tvb
583 # Run a single test from group_tvb.py, case_tvb.test_slice_4:
584 $ pytest test -k "case_tvb and test_slice_4"
586 See also https://www.wireshark.org/docs/wsdg_html_chunked/ChapterTests.html