doc/README.wmem

   1 $Id$
   2
   3 1. Introduction
   4
   5 The 'emem' memory manager (described in README.malloc) has been a part of
   6 Wireshark since 2005 and has served us well, but is starting to show its age.
   7 The framework has become increasingly difficult to maintain, and limitations
   8 in the API have blocked progress on other long-term goals such as multi-
   9 threading, and opening multiple files at once.
  10
  11 The 'wmem' memory manager is a new memory management framework that replaces
  12 emem. It provides a significantly updated API, a more modular design, and it
  13 isn't all jammed into one 2500-line file.
  14
  15 Wmem was originally conceived in this email to the wireshark-dev mailing list:
  16 https://www.wireshark.org/lists/wireshark-dev/201210/msg00178.html
  17
  18 The wmem code can now be found in epan/wmem/ in the Wireshark source tree.
  19
  20 2. Usage for Consumers
  21
  22 If you're writing a dissector, or other "userspace" code, then using wmem
  23 should be very similar to using emem. All you need to do is include the header
  24 (epan/wmem/wmem.h) and get a handle to a memory pool (if you want to *create*
  25 a memory pool, see the section "3. Usage for Producers" below).
  26
  27 A memory pool is an opaque pointer to an object of type wmem_allocator_t, and
  28 it is the very first parameter passed to almost every call you make to wmem.
  29 Other than that parameter (and the fact that functions are prefixed wmem_
  30 instead of ep_ or se_) usage is exactly like that of emem. For example:
  31
  32     wmem_alloc(myPool, 20);
  33
  34 allocates 20 bytes in the pool pointed to by myPool.
  35
  36 2.1 Available Pools
  37
  38 2.1.1 (Sort Of) Global Pools
  39
  40 Dissectors that include the wmem header file will have three pools available
  41 to them automatically: wmem_packet_scope(), wmem_file_scope() and
  42 wmem_epan_scope();
  43
  44 The packet pool is scoped to the dissection of each packet, replacing
  45 emem's ep_ allocators. The file pool is scoped to the dissection of each file,
  46 replacing emem's se_ allocators. For example:
  47
  48     ep_malloc(32);
  49     se_malloc(sizeof(guint));
  50
  51 could be replaced with
  52
  53     wmem_alloc(wmem_packet_scope(), 32);
  54     wmem_alloc(wmem_file_scope(),   sizeof(guint));
  55
  56 NB: Using these pools outside of the appropriate scope (eg using the packet
  57     pool when there isn't a packet being dissected) will throw an assertion.
  58     See the comment in epan/wmem/wmem_scopes.c for details.
  59
  60 The epan pool is scoped to the library's lifetime - memory allocated in it is
  61 not freed until epan_cleanup() is called, which is typically at the end of the
  62 program.
  63
  64 2.1.2 Pinfo Pool
  65
  66 Certain allocations (such as AT_STRINGZ address allocations and anything that
  67 might end up being passed to add_new_data_source) need their memory to stick
  68 around a little longer than the usual packet scope - basically until the
  69 next packet is dissected. This is, in fact, the scope of Wireshark's pinfo
  70 structure, so the pinfo struct has a 'pool' member which is a wmem pool scoped
  71 to the lifetime of the pinfo struct.
  72
  73 2.2 API
  74
  75 Full documentation for each function (parameters, return values, behaviours)
  76 lives (or will live) in Doxygen-format in the header files for those functions.
  77 This is just an overview of which header files you should be looking at.
  78
  79 2.2.1 Core API
  80
  81 wmem_core.h
  82  - Basic memory management functions like malloc, realloc and free.
  83
  84 2.2.2 Strings
  85
  86 wmem_strutl.h
  87  - Utility functions for manipulating null-terminated C-style strings.
  88    Functions like strdup and strdup_printf.
  89
  90 wmem_strbuf.h
  91  - A managed string object implementation, similar to std::string in C++ or
  92    GString from Glib.
  93
  94 2.2.3 Container Data Structures
  95
  96 wmem_array.h
  97  - A growable array (AKA vector) implementation.
  98
  99 wmem_list.h
 100  - A doubly-linked list implementation.
 101
 102 wmem_queue.h
 103  - A queue implementation (first-in, first-out).
 104
 105 wmem_stack.h
 106  - A stack implementation (last-in, first-out).
 107
 108 wmem_tree.h
 109  - A balanced binary tree (red-black tree) implementation.
 110
 111 2.2.4 Miscellanious Utilities
 112
 113 wmem_miscutl.h
 114  - Misc. utility functions like memdup.
 115
 116 2.3 Callbacks
 117
 118 WARNING: You probably don't actually need these; use them only when you're
 119          sure you understand the dangers.
 120
 121 Sometimes (though hopefully rarely) it may be necessary to store data in a wmem
 122 pool that requires additional cleanup before it is freed. For example, perhaps
 123 you have a pointer to a file-handle that needs to be closed. In this case, you
 124 can register a callback with the wmem_register_cleanup_callback function
 125 declared in wmem_user_cb.h. Every time the memory in a pool is freed, all
 126 registered cleanup functions are called first.
 127
 128 Note that callback calling order is not defined, you cannot rely on a
 129 certain callback being called before or after another.
 130
 131 WARNING: Manually freeing or moving memory (with wmem_free or wmem_realloc)
 132          will NOT trigger any callbacks. It is an error to call either of
 133          those functions on memory if you have a callback registered to deal
 134          with the contents of that memory.
 135
 136 3. Usage for Producers
 137
 138 NB: If you're just writing a dissector, you probably don't need to read
 139     this section.
 140
 141 One of the problems with the old emem framework was that there were basically
 142 two allocator backends (glib and mmap) that were all mixed together in a mess
 143 of if statements, environment variables and #ifdefs. In wmem the different
 144 allocator backends are cleanly separated out, and it's up to the owner of the
 145 pool to pick one.
 146
 147 3.1 Available Allocator Back-Ends
 148
 149 Each available allocator type has a corresponding entry in the
 150 wmem_allocator_type_t enumeration defined in wmem_core.h. See the doxygen
 151 comments in that header file for details on each type.
 152
 153 3.2 Creating a Pool
 154
 155 To create a pool, include the regular wmem header and call the
 156 wmem_allocator_new() function with the appropriate type value.
 157 For example:
 158
 159     #include "wmem/wmem.h"
 160
 161     wmem_allocator_t *myPool;
 162     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 163
 164 From here on in, you don't need to remember which type of allocator you used
 165 (although allocator authors are welcome to expose additional allocator-specific
 166 helper functions in their headers). The "myPool" variable can be passed around
 167 and used as normal in allocation requests as described in section 2 of this
 168 document.
 169
 170 3.3 Destroying a Pool
 171
 172 Regardless of which allocator you used to create a pool, it can be destroyed
 173 with a call to the function wmem_destroy_allocator(). For example:
 174
 175     #include "wmem/wmem.h"
 176
 177     wmem_allocator_t *myPool;
 178
 179     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 180
 181     /* Allocate some memory in myPool ... */
 182
 183     wmem_destroy_allocator(myPool);
 184
 185 Destroying a pool will free all the memory allocated in it.
 186
 187 3.4 Reusing a Pool
 188
 189 It is possible to free all the memory in a pool without destroying it,
 190 allowing it to be reused later. Depending on the type of allocator, doing this
 191 (by calling wmem_free_all()) can be significantly cheaper than fully destroying
 192 and recreating the pool. This method is therefore recommended, especially when
 193 the pool would otherwise be scoped to a single iteration of a loop. For example:
 194
 195     #include "wmem/wmem.h"
 196
 197     wmem_allocator_t *myPool;
 198
 199     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 200     for (...) {
 201
 202         /* Allocate some memory in myPool ... */
 203
 204         /* Free the memory, faster than destroying and recreating
 205            the pool each time through the loop. */
 206         wmem_free_all(myPool);
 207     }
 208     wmem_destroy_allocator(myPool);
 209
 210 4. Internal Design
 211
 212 Despite being written in Wireshark's standard C90, wmem follows a fairly
 213 object-oriented design pattern. Although efficiency is always a concern, the
 214 primary goals in writing wmem were maintainability and preventing memory
 215 leaks.
 216
 217 4.1 struct _wmem_allocator_t
 218
 219 The heart of wmem is the _wmem_allocator_t structure defined in the
 220 wmem_allocator.h header file. This structure uses C function pointers to
 221 implement a common object-oriented design pattern known as an interface (also
 222 known as an abstract class to those who are more familiar with C++).
 223
 224 Different allocator implementations can provide exactly the same interface by
 225 assigning their own functions to the members of an instance of the structure.
 226 The structure has eight members in three groups.
 227
 228 4.1.1 Implementation Details
 229
 230  - private_data
 231  - type
 232
 233 The private_data pointer is a void pointer that the allocator implementation can
 234 use to store whatever internal structures it needs. A pointer to private_data is
 235 passed to almost all of the other functions that the allocator implementation
 236 must define.
 237
 238 The type field is an enumeration of type wmem_allocator_type_t (see
 239 section 3.1). Its value is set by the wmem_allocator_new() function, not
 240 by the implementation-specific constructor. This field should be considered
 241 read-only by the allocator implementation.
 242
 243 4.1.2 Consumer Functions
 244
 245  - alloc()
 246  - free()
 247  - realloc()
 248
 249 These function pointers should be set to functions with semantics obviously
 250 similar to their standard-library namesakes. Each one takes an extra parameter
 251 that is a copy of the allocator's private_data pointer.
 252
 253 Note that realloc() and free() are not expected to be called directly by user
 254 code in most cases - they are primarily optimisations for use by data
 255 structures that wmem might want to implement (it's hard, for example, to
 256 implement a dynamically sized array without some form of realloc).
 257
 258 Also note that allocators do not have to handle NULL pointers or 0-length
 259 requests in any way - those checks are done in an allocator-agnostic way
 260 higher up in wmem. Allocator authors can assume that all incoming pointers
 261 (to realloc and free) are non-NULL, and that all incoming lengths (to malloc
 262 and realloc) are non-0.
 263
 264 4.1.3 Producer/Manager Functions
 265
 266  - free_all()
 267  - gc()
 268  - cleanup()
 269
 270 All of these functions take only one parameter, which is the allocator's
 271 private_data pointer.
 272
 273 The free_all() function should free all the memory currently allocated in the
 274 pool. Note that this is not necessarily exactly the same as calling free()
 275 on all the allocated blocks - free_all() is allowed to do additional cleanup
 276 or to make use of optimizations not available when freeing one block at a time.
 277
 278 The gc() function should do whatever it can to reduce excess memory usage in
 279 the dissector by returning unused blocks to the OS, optimizing internal data
 280 structures, etc.
 281
 282 The cleanup() function should do any final cleanup and free any and all memory.
 283 It is basically the equivalent of a destructor function. For simplicity, wmem
 284 is guaranteed to call free_all() immediately before this function. There is no
 285 such guarantee that gc() has (ever) been called.
 286
 287 4.2 Pool-Agnostic API
 288
 289 One of the issues with emem was that the API (including the public data
 290 structures) required wrapper functions for each scope implemented. Even
 291 if there was a stack implementation in emem, it wasn't necessarily available
 292 for use with file-scope memory unless someone took the time to write se_stack_
 293 wrapper functions for the interface.
 294
 295 In wmem, all public APIs take the pool as the first argument, so that they can
 296 be written once and used with any available memory pool. Data structures like
 297 wmem's stack implementation only take the pool when created - the provided
 298 pointer is stored internally with the data structure, and subsequent calls
 299 (like push and pop) will take the stack itself instead of the pool.
 300
 301 4.3 Debugging
 302
 303 The primary debugging control for wmem is the WIRESHARK_DEBUG_WMEM_OVERRIDE
 304 environment variable. If set, this value forces all calls to
 305 wmem_allocator_new() to return the same type of allocator, regardless of which
 306 type is requested normally by the code. It currently has three valid values:
 307
 308  - The value "simple" forces the use of WMEM_ALLOCATOR_SIMPLE. The valgrind
 309    script currently sets this value, since the simple allocator is the only
 310    one whose memory allocations are trackable properly by valgrind.
 311
 312  - The value "strict" forces the use of WMEM_ALLOCATOR_STRICT. The fuzz-test
 313    script currently sets this value, since the goal when fuzz-testing is to find
 314    as many errors as possible.
 315
 316  - The value "block" forces the use of WMEM_ALLOCATOR_BLOCK. This is not
 317    currently used by any scripts, but is useful for stress-testing the block
 318    allocator.
 319
 320 Note that regardless of the value of this variable, it will always be safe to
 321 call allocator-specific helpers functions. They are required to be safe no-ops
 322 if the allocator argument is of the wrong type.
 323
 324 4.4 Testing
 325
 326 There is a simple test suite for wmem that lives in the file wmem_test.c and
 327 should get automatically built into the binary 'wmem_test' when building
 328 Wireshark. It contains at least basic tests for all existing functionality.
 329 The suite is run automatically by the build-bots via the shell script
 330 test/test.sh which calls out to test/suite-unittests.sh.
 331
 332 New features added to wmem (allocators, data structures, utility
 333 functions, etc.) must also have tests added to this suite.
 334
 335 The test suite could potentially use a clean-up by someone more
 336 intimately familiar with Glib's testing framework, but it does the job.
 337
 338 5. TODO List
 339
 340 The following is a list of things that emem didn't provide but that it might
 341 be nice if wmem did:
 342
 343  - radix tree
 344  - hash table
 345
 346 /*
 347  * Editor modelines  -  http://www.wireshark.org/tools/modelines.html
 348  *
 349  * Local variables:
 350  * c-basic-offset: 4
 351  * tab-width: 8
 352  * indent-tabs-mode: nil
 353  * End:
 354  *
 355  * vi: set shiftwidth=4 tabstop=8 expandtab:
 356  * :indentSize=4:tabSize=8:noTabs=true:
 357  */