3 <title>GDB Internals
</title>
4 <meta http-equiv=
"Content-Type" content=
"text/html">
5 <meta name=
"description" content=
"GDB Internals">
6 <meta name=
"generator" content=
"makeinfo 4.3">
7 <link href=
"http://www.gnu.org/software/texinfo/" rel=
"generator-home">
12 Node:
<a name=
"Symbol%20Handling">Symbol Handling
</a>,
13 Next:
<a rel=
"next" accesskey=
"n" href=
"Language-Support.html#Language%20Support">Language Support
</a>,
14 Previous:
<a rel=
"previous" accesskey=
"p" href=
"libgdb.html#libgdb">libgdb
</a>,
15 Up:
<a rel=
"up" accesskey=
"u" href=
"index.html#Top">Top
</a>
19 <h2 class=
"chapter">Symbol Handling
</h2>
21 <p>Symbols are a key part of GDB's operation. Symbols include variables,
24 <h3 class=
"section">Symbol Reading
</h3>
26 GDB reads symbols from
<dfn>symbol files
</dfn>. The usual symbol
27 file is the file containing the program which GDB is
28 debugging. GDB can be directed to use a different file for
29 symbols (with the
<code>symbol-file
</code> command), and it can also read
30 more symbols via the
<code>add-file
</code> and
<code>load
</code> commands, or while
31 reading symbols from shared libraries.
33 <p>Symbol files are initially opened by code in
<code>symfile.c
</code> using
34 the BFD library (see
<a href=
"Support-Libraries.html#Support%20Libraries">Support Libraries
</a>). BFD identifies the type
35 of the file by examining its header.
<code>find_sym_fns
</code> then uses
36 this identification to locate a set of symbol-reading functions.
38 <p>Symbol-reading modules identify themselves to GDB by calling
39 <code>add_symtab_fns
</code> during their module initialization. The argument
40 to
<code>add_symtab_fns
</code> is a
<code>struct sym_fns
</code> which contains the
41 name (or name prefix) of the symbol format, the length of the prefix,
42 and pointers to four functions. These functions are called at various
43 times to process symbol files whose identification matches the specified
46 <p>The functions supplied by each module are:
49 <dt><code></code><var>xyz
</var><code>_symfile_init(struct sym_fns *sf)
</code>
51 Called from
<code>symbol_file_add
</code> when we are about to read a new
52 symbol file. This function should clean up any internal state (possibly
53 resulting from half-read previous files, for example) and prepare to
54 read a new symbol file. Note that the symbol file which we are reading
55 might be a new
"main" symbol file, or might be a secondary symbol file
56 whose symbols are being added to the existing symbol table.
58 <p>The argument to
<code></code><var>xyz
</var><code>_symfile_init
</code> is a newly allocated
59 <code>struct sym_fns
</code> whose
<code>bfd
</code> field contains the BFD for the
60 new symbol file being read. Its
<code>private
</code> field has been zeroed,
61 and can be modified as desired. Typically, a struct of private
62 information will be
<code>malloc
</code>'d, and a pointer to it will be placed
63 in the
<code>private
</code> field.
65 <p>There is no result from
<code></code><var>xyz
</var><code>_symfile_init
</code>, but it can call
66 <code>error
</code> if it detects an unavoidable problem.
68 <br><dt><code></code><var>xyz
</var><code>_new_init()
</code>
70 Called from
<code>symbol_file_add
</code> when discarding existing symbols.
71 This function needs only handle the symbol-reading module's internal
72 state; the symbol table data structures visible to the rest of
73 GDB will be discarded by
<code>symbol_file_add
</code>. It has no
74 arguments and no result. It may be called after
75 <code></code><var>xyz
</var><code>_symfile_init
</code>, if a new symbol table is being read, or
76 may be called alone if all symbols are simply being discarded.
78 <br><dt><code></code><var>xyz
</var><code>_symfile_read(struct sym_fns *sf, CORE_ADDR addr, int mainline)
</code>
80 Called from
<code>symbol_file_add
</code> to actually read the symbols from a
81 symbol-file into a set of psymtabs or symtabs.
83 <p><code>sf
</code> points to the
<code>struct sym_fns
</code> originally passed to
84 <code></code><var>xyz
</var><code>_sym_init
</code> for possible initialization.
<code>addr
</code> is
85 the offset between the file's specified start address and its true
86 address in memory.
<code>mainline
</code> is
1 if this is the main symbol
87 table being read, and
0 if a secondary symbol file (e.g. shared library
88 or dynamically loaded file) is being read.
91 <p>In addition, if a symbol-reading module creates psymtabs when
92 <var>xyz
</var>_symfile_read is called, these psymtabs will contain a pointer
93 to a function
<code></code><var>xyz
</var><code>_psymtab_to_symtab
</code>, which can be called
94 from any point in the GDB symbol-handling code.
97 <dt><code></code><var>xyz
</var><code>_psymtab_to_symtab (struct partial_symtab *pst)
</code>
99 Called from
<code>psymtab_to_symtab
</code> (or the
<code>PSYMTAB_TO_SYMTAB
</code> macro) if
100 the psymtab has not already been read in and had its
<code>pst-
>symtab
</code>
101 pointer set. The argument is the psymtab to be fleshed-out into a
102 symtab. Upon return,
<code>pst-
>readin
</code> should have been set to
1, and
103 <code>pst-
>symtab
</code> should contain a pointer to the new corresponding symtab, or
104 zero if there were no symbols in that part of the symbol file.
107 <h3 class=
"section">Partial Symbol Tables
</h3>
109 GDB has three types of symbol tables:
112 <li>Full symbol tables (
<dfn>symtabs
</dfn>). These contain the main
113 information about symbols and addresses.
115 <li>Partial symbol tables (
<dfn>psymtabs
</dfn>). These contain enough
116 information to know when to read the corresponding part of the full
119 <li>Minimal symbol tables (
<dfn>msymtabs
</dfn>). These contain information
120 gleaned from non-debugging symbols.
123 <p>This section describes partial symbol tables.
125 <p>A psymtab is constructed by doing a very quick pass over an executable
126 file's debugging information. Small amounts of information are
127 extracted--enough to identify which parts of the symbol table will
128 need to be re-read and fully digested later, when the user needs the
129 information. The speed of this pass causes GDB to start up very
130 quickly. Later, as the detailed rereading occurs, it occurs in small
131 pieces, at various times, and the delay therefrom is mostly invisible to
134 <p>The symbols that show up in a file's psymtab should be, roughly, those
135 visible to the debugger's user when the program is not running code from
136 that file. These include external symbols and types, static symbols and
137 types, and
<code>enum
</code> values declared at file scope.
139 <p>The psymtab also contains the range of instruction addresses that the
140 full symbol table would represent.
142 <p>The idea is that there are only two ways for the user (or much of the
143 code in the debugger) to reference a symbol:
146 <li>By its address (e.g. execution stops at some address which is inside a
147 function in this file). The address will be noticed to be in the
148 range of this psymtab, and the full symtab will be read in.
149 <code>find_pc_function
</code>,
<code>find_pc_line
</code>, and other
150 <code>find_pc_...
</code> functions handle this.
153 (e.g. the user asks to print a variable, or set a breakpoint on a
154 function). Global names and file-scope names will be found in the
155 psymtab, which will cause the symtab to be pulled in. Local names will
156 have to be qualified by a global name, or a file-scope name, in which
157 case we will have already read in the symtab as we evaluated the
158 qualifier. Or, a local symbol can be referenced when we are
"in" a
159 local scope, in which case the first case applies.
<code>lookup_symbol
</code>
160 does most of the work here.
163 <p>The only reason that psymtabs exist is to cause a symtab to be read in
164 at the right moment. Any symbol that can be elided from a psymtab,
165 while still causing that to happen, should not appear in it. Since
166 psymtabs don't have the idea of scope, you can't put local symbols in
167 them anyway. Psymtabs don't have the idea of the type of a symbol,
168 either, so types need not appear, unless they will be referenced by
171 <p>It is a bug for GDB to behave one way when only a psymtab has
172 been read, and another way if the corresponding symtab has been read
173 in. Such bugs are typically caused by a psymtab that does not contain
174 all the visible symbols, or which has the wrong instruction address
177 <p>The psymtab for a particular section of a symbol file (objfile) could be
178 thrown away after the symtab has been read in. The symtab should always
179 be searched before the psymtab, so the psymtab will never be used (in a
180 bug-free environment). Currently, psymtabs are allocated on an obstack,
181 and all the psymbols themselves are allocated in a pair of large arrays
182 on an obstack, so there is little to be gained by trying to free them
183 unless you want to do a lot more work.
185 <h3 class=
"section">Types
</h3>
187 <h4 class=
"unnumberedsubsec">Fundamental Types (e.g.,
<code>FT_VOID
</code>,
<code>FT_BOOLEAN
</code>).
</h4>
189 <p>These are the fundamental types that GDB uses internally. Fundamental
190 types from the various debugging formats (stabs, ELF, etc) are mapped
191 into one of these. They are basically a union of all fundamental types
192 that GDB knows about for all the languages that GDB
195 <h4 class=
"unnumberedsubsec">Type Codes (e.g.,
<code>TYPE_CODE_PTR
</code>,
<code>TYPE_CODE_ARRAY
</code>).
</h4>
197 <p>Each time GDB builds an internal type, it marks it with one
198 of these types. The type may be a fundamental type, such as
199 <code>TYPE_CODE_INT
</code>, or a derived type, such as
<code>TYPE_CODE_PTR
</code>
200 which is a pointer to another type. Typically, several
<code>FT_*
</code>
201 types map to one
<code>TYPE_CODE_*
</code> type, and are distinguished by
202 other members of the type struct, such as whether the type is signed
203 or unsigned, and how many bits it uses.
205 <h4 class=
"unnumberedsubsec">Builtin Types (e.g.,
<code>builtin_type_void
</code>,
<code>builtin_type_char
</code>).
</h4>
207 <p>These are instances of type structs that roughly correspond to
208 fundamental types and are created as global types for GDB to
209 use for various ugly historical reasons. We eventually want to
210 eliminate these. Note for example that
<code>builtin_type_int
</code>
211 initialized in
<code>gdbtypes.c
</code> is basically the same as a
212 <code>TYPE_CODE_INT
</code> type that is initialized in
<code>c-lang.c
</code> for
213 an
<code>FT_INTEGER
</code> fundamental type. The difference is that the
214 <code>builtin_type
</code> is not associated with any particular objfile, and
215 only one instance exists, while
<code>c-lang.c
</code> builds as many
216 <code>TYPE_CODE_INT
</code> types as needed, with each one associated with
217 some particular objfile.
219 <h3 class=
"section">Object File Formats
</h3>
221 <h4 class=
"subsection">a.out
</h4>
223 <p>The
<code>a.out
</code> format is the original file format for Unix. It
224 consists of three sections:
<code>text
</code>,
<code>data
</code>, and
<code>bss
</code>,
225 which are for program code, initialized data, and uninitialized data,
228 <p>The
<code>a.out
</code> format is so simple that it doesn't have any reserved
229 place for debugging information. (Hey, the original Unix hackers used
230 <code>adb
</code>, which is a machine-language debugger!) The only debugging
231 format for
<code>a.out
</code> is stabs, which is encoded as a set of normal
232 symbols with distinctive attributes.
234 <p>The basic
<code>a.out
</code> reader is in
<code>dbxread.c
</code>.
236 <h4 class=
"subsection">COFF
</h4>
238 <p>The COFF format was introduced with System V Release
3 (SVR3) Unix.
239 COFF files may have multiple sections, each prefixed by a header. The
240 number of sections is limited.
242 <p>The COFF specification includes support for debugging. Although this
243 was a step forward, the debugging information was woefully limited. For
244 instance, it was not possible to represent code that came from an
247 <p>The COFF reader is in
<code>coffread.c
</code>.
249 <h4 class=
"subsection">ECOFF
</h4>
251 <p>ECOFF is an extended COFF originally introduced for Mips and Alpha
254 <p>The basic ECOFF reader is in
<code>mipsread.c
</code>.
256 <h4 class=
"subsection">XCOFF
</h4>
258 <p>The IBM RS/
6000 running AIX uses an object file format called XCOFF.
259 The COFF sections, symbols, and line numbers are used, but debugging
260 symbols are
<code>dbx
</code>-style stabs whose strings are located in the
261 <code>.debug
</code> section (rather than the string table). For more
262 information, see
<a href=
"../stabs/index.html#Top">Top
</a>.
264 <p>The shared library scheme has a clean interface for figuring out what
265 shared libraries are in use, but the catch is that everything which
266 refers to addresses (symbol tables and breakpoints at least) needs to be
267 relocated for both shared libraries and the main executable. At least
268 using the standard mechanism this can only be done once the program has
269 been run (or the core file has been read).
271 <h4 class=
"subsection">PE
</h4>
273 <p>Windows
95 and NT use the PE (
<dfn>Portable Executable
</dfn>) format for their
274 executables. PE is basically COFF with additional headers.
276 <p>While BFD includes special PE support, GDB needs only the basic
279 <h4 class=
"subsection">ELF
</h4>
281 <p>The ELF format came with System V Release
4 (SVR4) Unix. ELF is similar
282 to COFF in being organized into a number of sections, but it removes
283 many of COFF's limitations.
285 <p>The basic ELF reader is in
<code>elfread.c
</code>.
287 <h4 class=
"subsection">SOM
</h4>
289 <p>SOM is HP's object file and debug format (not to be confused with IBM's
290 SOM, which is a cross-language ABI).
292 <p>The SOM reader is in
<code>hpread.c
</code>.
294 <h4 class=
"subsection">Other File Formats
</h4>
296 <p>Other file formats that have been supported by GDB include Netware
297 Loadable Modules (
<code>nlmread.c
</code>).
299 <h3 class=
"section">Debugging File Formats
</h3>
301 <p>This section describes characteristics of debugging information that
302 are independent of the object file format.
304 <h4 class=
"subsection">stabs
</h4>
306 <p><code>stabs
</code> started out as special symbols within the
<code>a.out
</code>
307 format. Since then, it has been encapsulated into other file
308 formats, such as COFF and ELF.
310 <p>While
<code>dbxread.c
</code> does some of the basic stab processing,
311 including for encapsulated versions,
<code>stabsread.c
</code> does
314 <h4 class=
"subsection">COFF
</h4>
316 <p>The basic COFF definition includes debugging information. The level
317 of support is minimal and non-extensible, and is not often used.
319 <h4 class=
"subsection">Mips debug (Third Eye)
</h4>
321 <p>ECOFF includes a definition of a special debug format.
323 <p>The file
<code>mdebugread.c
</code> implements reading for this format.
325 <h4 class=
"subsection">DWARF
1</h4>
327 <p>DWARF
1 is a debugging format that was originally designed to be
328 used with ELF in SVR4 systems.
330 <p>The DWARF
1 reader is in
<code>dwarfread.c
</code>.
332 <h4 class=
"subsection">DWARF
2</h4>
334 <p>DWARF
2 is an improved but incompatible version of DWARF
1.
336 <p>The DWARF
2 reader is in
<code>dwarf2read.c
</code>.
338 <h4 class=
"subsection">SOM
</h4>
340 <p>Like COFF, the SOM definition includes debugging information.
342 <h3 class=
"section">Adding a New Symbol Reader to GDB
</h3>
344 <p>If you are using an existing object file format (
<code>a.out
</code>, COFF, ELF, etc),
345 there is probably little to be done.
347 <p>If you need to add a new object file format, you must first add it to
348 BFD. This is beyond the scope of this document.
350 <p>You must then arrange for the BFD code to provide access to the
351 debugging symbols. Generally GDB will have to call swapping routines
352 from BFD and a few other BFD internal routines to locate the debugging
353 information. As much as possible, GDB should not depend on the BFD
354 internal data structures.
356 <p>For some targets (e.g., COFF), there is a special transfer vector used
357 to call swapping routines, since the external data structures on various
358 platforms have different sizes and layouts. Specialized routines that
359 will only ever be implemented by one object file format may be called
360 directly. This interface should be described in a file
361 <code>bfd/lib
</code><var>xyz
</var><code>.h
</code>, which is included by GDB.