mingw/info/gdbint/Symbol-Handling.html

   1 <html lang="en">
   2 <head>
   3 <title>GDB Internals</title>
   4 <meta http-equiv="Content-Type" content="text/html">
   5 <meta name="description" content="GDB Internals">
   6 <meta name="generator" content="makeinfo 4.3">
   7 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home">
   8 </head>
   9 <body>
  10 <div class="node">
  11 <p>
  12 Node:<a name="Symbol%20Handling">Symbol Handling</a>,
  13 Next:<a rel="next" accesskey="n" href="Language-Support.html#Language%20Support">Language Support</a>,
  14 Previous:<a rel="previous" accesskey="p" href="libgdb.html#libgdb">libgdb</a>,
  15 Up:<a rel="up" accesskey="u" href="index.html#Top">Top</a>
  16 <hr><br>
  17 </div>
  18
  19 <h2 class="chapter">Symbol Handling</h2>
  20
  21    <p>Symbols are a key part of GDB's operation.  Symbols include variables,
  22 functions, and types.
  23
  24 <h3 class="section">Symbol Reading</h3>
  25
  26    GDB reads symbols from <dfn>symbol files</dfn>.  The usual symbol
  27 file is the file containing the program which GDB is
  28 debugging.  GDB can be directed to use a different file for
  29 symbols (with the <code>symbol-file</code> command), and it can also read
  30 more symbols via the <code>add-file</code> and <code>load</code> commands, or while
  31 reading symbols from shared libraries.
  32
  33    <p>Symbol files are initially opened by code in <code>symfile.c</code> using
  34 the BFD library (see <a href="Support-Libraries.html#Support%20Libraries">Support Libraries</a>).  BFD identifies the type
  35 of the file by examining its header.  <code>find_sym_fns</code> then uses
  36 this identification to locate a set of symbol-reading functions.
  37
  38    <p>Symbol-reading modules identify themselves to GDB by calling
  39 <code>add_symtab_fns</code> during their module initialization.  The argument
  40 to <code>add_symtab_fns</code> is a <code>struct sym_fns</code> which contains the
  41 name (or name prefix) of the symbol format, the length of the prefix,
  42 and pointers to four functions.  These functions are called at various
  43 times to process symbol files whose identification matches the specified
  44 prefix.
  45
  46    <p>The functions supplied by each module are:
  47
  48      <dl>
  49 <dt><code></code><var>xyz</var><code>_symfile_init(struct sym_fns *sf)</code>
  50      <dd>
  51 Called from <code>symbol_file_add</code> when we are about to read a new
  52 symbol file.  This function should clean up any internal state (possibly
  53 resulting from half-read previous files, for example) and prepare to
  54 read a new symbol file.  Note that the symbol file which we are reading
  55 might be a new "main" symbol file, or might be a secondary symbol file
  56 whose symbols are being added to the existing symbol table.
  57
  58      <p>The argument to <code></code><var>xyz</var><code>_symfile_init</code> is a newly allocated
  59 <code>struct sym_fns</code> whose <code>bfd</code> field contains the BFD for the
  60 new symbol file being read.  Its <code>private</code> field has been zeroed,
  61 and can be modified as desired.  Typically, a struct of private
  62 information will be <code>malloc</code>'d, and a pointer to it will be placed
  63 in the <code>private</code> field.
  64
  65      <p>There is no result from <code></code><var>xyz</var><code>_symfile_init</code>, but it can call
  66 <code>error</code> if it detects an unavoidable problem.
  67
  68      <br><dt><code></code><var>xyz</var><code>_new_init()</code>
  69      <dd>
  70 Called from <code>symbol_file_add</code> when discarding existing symbols.
  71 This function needs only handle the symbol-reading module's internal
  72 state; the symbol table data structures visible to the rest of
  73 GDB will be discarded by <code>symbol_file_add</code>.  It has no
  74 arguments and no result.  It may be called after
  75 <code></code><var>xyz</var><code>_symfile_init</code>, if a new symbol table is being read, or
  76 may be called alone if all symbols are simply being discarded.
  77
  78      <br><dt><code></code><var>xyz</var><code>_symfile_read(struct sym_fns *sf, CORE_ADDR addr, int mainline)</code>
  79      <dd>
  80 Called from <code>symbol_file_add</code> to actually read the symbols from a
  81 symbol-file into a set of psymtabs or symtabs.
  82
  83      <p><code>sf</code> points to the <code>struct sym_fns</code> originally passed to
  84 <code></code><var>xyz</var><code>_sym_init</code> for possible initialization.  <code>addr</code> is
  85 the offset between the file's specified start address and its true
  86 address in memory.  <code>mainline</code> is 1 if this is the main symbol
  87 table being read, and 0 if a secondary symbol file (e.g. shared library
  88 or dynamically loaded file) is being read.
  89 </dl>
  90
  91    <p>In addition, if a symbol-reading module creates psymtabs when
  92 <var>xyz</var>_symfile_read is called, these psymtabs will contain a pointer
  93 to a function <code></code><var>xyz</var><code>_psymtab_to_symtab</code>, which can be called
  94 from any point in the GDB symbol-handling code.
  95
  96      <dl>
  97 <dt><code></code><var>xyz</var><code>_psymtab_to_symtab (struct partial_symtab *pst)</code>
  98      <dd>
  99 Called from <code>psymtab_to_symtab</code> (or the <code>PSYMTAB_TO_SYMTAB</code> macro) if
 100 the psymtab has not already been read in and had its <code>pst-&gt;symtab</code>
 101 pointer set.  The argument is the psymtab to be fleshed-out into a
 102 symtab.  Upon return, <code>pst-&gt;readin</code> should have been set to 1, and
 103 <code>pst-&gt;symtab</code> should contain a pointer to the new corresponding symtab, or
 104 zero if there were no symbols in that part of the symbol file.
 105 </dl>
 106
 107 <h3 class="section">Partial Symbol Tables</h3>
 108
 109    GDB has three types of symbol tables:
 110
 111      <ul>
 112 <li>Full symbol tables (<dfn>symtabs</dfn>).  These contain the main
 113 information about symbols and addresses.
 114
 115      <li>Partial symbol tables (<dfn>psymtabs</dfn>).  These contain enough
 116 information to know when to read the corresponding part of the full
 117 symbol table.
 118
 119      <li>Minimal symbol tables (<dfn>msymtabs</dfn>).  These contain information
 120 gleaned from non-debugging symbols.
 121 </ul>
 122
 123    <p>This section describes partial symbol tables.
 124
 125    <p>A psymtab is constructed by doing a very quick pass over an executable
 126 file's debugging information.  Small amounts of information are
 127 extracted--enough to identify which parts of the symbol table will
 128 need to be re-read and fully digested later, when the user needs the
 129 information.  The speed of this pass causes GDB to start up very
 130 quickly.  Later, as the detailed rereading occurs, it occurs in small
 131 pieces, at various times, and the delay therefrom is mostly invisible to
 132 the user.
 133
 134    <p>The symbols that show up in a file's psymtab should be, roughly, those
 135 visible to the debugger's user when the program is not running code from
 136 that file.  These include external symbols and types, static symbols and
 137 types, and <code>enum</code> values declared at file scope.
 138
 139    <p>The psymtab also contains the range of instruction addresses that the
 140 full symbol table would represent.
 141
 142    <p>The idea is that there are only two ways for the user (or much of the
 143 code in the debugger) to reference a symbol:
 144
 145      <ul>
 146 <li>By its address (e.g. execution stops at some address which is inside a
 147 function in this file).  The address will be noticed to be in the
 148 range of this psymtab, and the full symtab will be read in.
 149 <code>find_pc_function</code>, <code>find_pc_line</code>, and other
 150 <code>find_pc_...</code> functions handle this.
 151
 152      <li>By its name
 153 (e.g. the user asks to print a variable, or set a breakpoint on a
 154 function).  Global names and file-scope names will be found in the
 155 psymtab, which will cause the symtab to be pulled in.  Local names will
 156 have to be qualified by a global name, or a file-scope name, in which
 157 case we will have already read in the symtab as we evaluated the
 158 qualifier.  Or, a local symbol can be referenced when we are "in" a
 159 local scope, in which case the first case applies.  <code>lookup_symbol</code>
 160 does most of the work here.
 161 </ul>
 162
 163    <p>The only reason that psymtabs exist is to cause a symtab to be read in
 164 at the right moment.  Any symbol that can be elided from a psymtab,
 165 while still causing that to happen, should not appear in it.  Since
 166 psymtabs don't have the idea of scope, you can't put local symbols in
 167 them anyway.  Psymtabs don't have the idea of the type of a symbol,
 168 either, so types need not appear, unless they will be referenced by
 169 name.
 170
 171    <p>It is a bug for GDB to behave one way when only a psymtab has
 172 been read, and another way if the corresponding symtab has been read
 173 in.  Such bugs are typically caused by a psymtab that does not contain
 174 all the visible symbols, or which has the wrong instruction address
 175 ranges.
 176
 177    <p>The psymtab for a particular section of a symbol file (objfile) could be
 178 thrown away after the symtab has been read in.  The symtab should always
 179 be searched before the psymtab, so the psymtab will never be used (in a
 180 bug-free environment).  Currently, psymtabs are allocated on an obstack,
 181 and all the psymbols themselves are allocated in a pair of large arrays
 182 on an obstack, so there is little to be gained by trying to free them
 183 unless you want to do a lot more work.
 184
 185 <h3 class="section">Types</h3>
 186
 187 <h4 class="unnumberedsubsec">Fundamental Types (e.g., <code>FT_VOID</code>, <code>FT_BOOLEAN</code>).</h4>
 188
 189    <p>These are the fundamental types that GDB uses internally.  Fundamental
 190 types from the various debugging formats (stabs, ELF, etc) are mapped
 191 into one of these.  They are basically a union of all fundamental types
 192 that GDB knows about for all the languages that GDB
 193 knows about.
 194
 195 <h4 class="unnumberedsubsec">Type Codes (e.g., <code>TYPE_CODE_PTR</code>, <code>TYPE_CODE_ARRAY</code>).</h4>
 196
 197    <p>Each time GDB builds an internal type, it marks it with one
 198 of these types.  The type may be a fundamental type, such as
 199 <code>TYPE_CODE_INT</code>, or a derived type, such as <code>TYPE_CODE_PTR</code>
 200 which is a pointer to another type.  Typically, several <code>FT_*</code>
 201 types map to one <code>TYPE_CODE_*</code> type, and are distinguished by
 202 other members of the type struct, such as whether the type is signed
 203 or unsigned, and how many bits it uses.
 204
 205 <h4 class="unnumberedsubsec">Builtin Types (e.g., <code>builtin_type_void</code>, <code>builtin_type_char</code>).</h4>
 206
 207    <p>These are instances of type structs that roughly correspond to
 208 fundamental types and are created as global types for GDB to
 209 use for various ugly historical reasons.  We eventually want to
 210 eliminate these.  Note for example that <code>builtin_type_int</code>
 211 initialized in <code>gdbtypes.c</code> is basically the same as a
 212 <code>TYPE_CODE_INT</code> type that is initialized in <code>c-lang.c</code> for
 213 an <code>FT_INTEGER</code> fundamental type.  The difference is that the
 214 <code>builtin_type</code> is not associated with any particular objfile, and
 215 only one instance exists, while <code>c-lang.c</code> builds as many
 216 <code>TYPE_CODE_INT</code> types as needed, with each one associated with
 217 some particular objfile.
 218
 219 <h3 class="section">Object File Formats</h3>
 220
 221 <h4 class="subsection">a.out</h4>
 222
 223    <p>The <code>a.out</code> format is the original file format for Unix.  It
 224 consists of three sections: <code>text</code>, <code>data</code>, and <code>bss</code>,
 225 which are for program code, initialized data, and uninitialized data,
 226 respectively.
 227
 228    <p>The <code>a.out</code> format is so simple that it doesn't have any reserved
 229 place for debugging information.  (Hey, the original Unix hackers used
 230 <code>adb</code>, which is a machine-language debugger!)  The only debugging
 231 format for <code>a.out</code> is stabs, which is encoded as a set of normal
 232 symbols with distinctive attributes.
 233
 234    <p>The basic <code>a.out</code> reader is in <code>dbxread.c</code>.
 235
 236 <h4 class="subsection">COFF</h4>
 237
 238    <p>The COFF format was introduced with System V Release 3 (SVR3) Unix.
 239 COFF files may have multiple sections, each prefixed by a header.  The
 240 number of sections is limited.
 241
 242    <p>The COFF specification includes support for debugging.  Although this
 243 was a step forward, the debugging information was woefully limited.  For
 244 instance, it was not possible to represent code that came from an
 245 included file.
 246
 247    <p>The COFF reader is in <code>coffread.c</code>.
 248
 249 <h4 class="subsection">ECOFF</h4>
 250
 251    <p>ECOFF is an extended COFF originally introduced for Mips and Alpha
 252 workstations.
 253
 254    <p>The basic ECOFF reader is in <code>mipsread.c</code>.
 255
 256 <h4 class="subsection">XCOFF</h4>
 257
 258    <p>The IBM RS/6000 running AIX uses an object file format called XCOFF.
 259 The COFF sections, symbols, and line numbers are used, but debugging
 260 symbols are <code>dbx</code>-style stabs whose strings are located in the
 261 <code>.debug</code> section (rather than the string table).  For more
 262 information, see <a href="../stabs/index.html#Top">Top</a>.
 263
 264    <p>The shared library scheme has a clean interface for figuring out what
 265 shared libraries are in use, but the catch is that everything which
 266 refers to addresses (symbol tables and breakpoints at least) needs to be
 267 relocated for both shared libraries and the main executable.  At least
 268 using the standard mechanism this can only be done once the program has
 269 been run (or the core file has been read).
 270
 271 <h4 class="subsection">PE</h4>
 272
 273    <p>Windows 95 and NT use the PE (<dfn>Portable Executable</dfn>) format for their
 274 executables.  PE is basically COFF with additional headers.
 275
 276    <p>While BFD includes special PE support, GDB needs only the basic
 277 COFF reader.
 278
 279 <h4 class="subsection">ELF</h4>
 280
 281    <p>The ELF format came with System V Release 4 (SVR4) Unix.  ELF is similar
 282 to COFF in being organized into a number of sections, but it removes
 283 many of COFF's limitations.
 284
 285    <p>The basic ELF reader is in <code>elfread.c</code>.
 286
 287 <h4 class="subsection">SOM</h4>
 288
 289    <p>SOM is HP's object file and debug format (not to be confused with IBM's
 290 SOM, which is a cross-language ABI).
 291
 292    <p>The SOM reader is in <code>hpread.c</code>.
 293
 294 <h4 class="subsection">Other File Formats</h4>
 295
 296    <p>Other file formats that have been supported by GDB include Netware
 297 Loadable Modules (<code>nlmread.c</code>).
 298
 299 <h3 class="section">Debugging File Formats</h3>
 300
 301    <p>This section describes characteristics of debugging information that
 302 are independent of the object file format.
 303
 304 <h4 class="subsection">stabs</h4>
 305
 306    <p><code>stabs</code> started out as special symbols within the <code>a.out</code>
 307 format.  Since then, it has been encapsulated into other file
 308 formats, such as COFF and ELF.
 309
 310    <p>While <code>dbxread.c</code> does some of the basic stab processing,
 311 including for encapsulated versions, <code>stabsread.c</code> does
 312 the real work.
 313
 314 <h4 class="subsection">COFF</h4>
 315
 316    <p>The basic COFF definition includes debugging information.  The level
 317 of support is minimal and non-extensible, and is not often used.
 318
 319 <h4 class="subsection">Mips debug (Third Eye)</h4>
 320
 321    <p>ECOFF includes a definition of a special debug format.
 322
 323    <p>The file <code>mdebugread.c</code> implements reading for this format.
 324
 325 <h4 class="subsection">DWARF 1</h4>
 326
 327    <p>DWARF 1 is a debugging format that was originally designed to be
 328 used with ELF in SVR4 systems.
 329
 330    <p>The DWARF 1 reader is in <code>dwarfread.c</code>.
 331
 332 <h4 class="subsection">DWARF 2</h4>
 333
 334    <p>DWARF 2 is an improved but incompatible version of DWARF 1.
 335
 336    <p>The DWARF 2 reader is in <code>dwarf2read.c</code>.
 337
 338 <h4 class="subsection">SOM</h4>
 339
 340    <p>Like COFF, the SOM definition includes debugging information.
 341
 342 <h3 class="section">Adding a New Symbol Reader to GDB</h3>
 343
 344    <p>If you are using an existing object file format (<code>a.out</code>, COFF, ELF, etc),
 345 there is probably little to be done.
 346
 347    <p>If you need to add a new object file format, you must first add it to
 348 BFD.  This is beyond the scope of this document.
 349
 350    <p>You must then arrange for the BFD code to provide access to the
 351 debugging symbols.  Generally GDB will have to call swapping routines
 352 from BFD and a few other BFD internal routines to locate the debugging
 353 information.  As much as possible, GDB should not depend on the BFD
 354 internal data structures.
 355
 356    <p>For some targets (e.g., COFF), there is a special transfer vector used
 357 to call swapping routines, since the external data structures on various
 358 platforms have different sizes and layouts.  Specialized routines that
 359 will only ever be implemented by one object file format may be called
 360 directly.  This interface should be described in a file
 361 <code>bfd/lib</code><var>xyz</var><code>.h</code>, which is included by GDB.
 362
 363    </body></html>
 364