llvm.org GIT mirror llvm / 576eea8
[PDB] Add documentation for the DBI Stream. Differential Revision: https://reviews.llvm.org/D26552 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286853 91177308-0d34-0410-b5e6-96231b3b80d8 Zachary Turner 2 years ago
2 changed file(s) with 450 addition(s) and 3 deletion(s). Raw diff Collapse all Expand all
None =====================================
1 The PDB DBI (Debug Info) Stream
2 =====================================
0 =====================================
1 The PDB DBI (Debug Info) Stream
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _dbi_intro:
8
9 Introduction
10 ============
11
12 The PDB DBI Stream (Index 3) is one of the largest and most important streams
13 in a PDB file. It contains information about how the program was compiled,
14 (e.g. compilation flags, etc), the compilands (e.g. object files) that
15 were used to link together the program, the source files which were used
16 to build the program, as well as references to other streams that contain more
17 detailed information about each compiland, such as the CodeView symbol records
18 contained within each compiland and the source and line information for
19 functions and other symbols within each compiland.
20
21
22 .. _dbi_header:
23
24 Stream Header
25 =============
26 At offset 0 of the DBI Stream is a header with the following layout:
27
28
29 .. code-block:: c++
30
31 struct DbiStreamHeader {
32 int32_t VersionSignature;
33 uint32_t VersionHeader;
34 uint32_t Age;
35 uint16_t GlobalStreamIndex;
36 uint16_t BuildNumber;
37 uint16_t PublicStreamIndex;
38 uint16_t PdbDllVersion;
39 uint16_t SymRecordStream;
40 uint16_t PdbDllRbld;
41 int32_t ModInfoSize;
42 int32_t SectionContributionSize;
43 int32_t SectionMapSize;
44 int32_t SourceInfoSize;
45 int32_t TypeServerSize;
46 uint32_t MFCTypeServerIndex;
47 int32_t OptionalDbgHeaderSize;
48 int32_t ECSubstreamSize;
49 uint16_t Flags;
50 uint16_t Machine;
51 uint32_t Padding;
52 };
53
54 - **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
55
56 - **VersionHeader** - A value from the following enum.
57
58 .. code-block:: c++
59
60 enum class DbiStreamVersion : uint32_t {
61 VC41 = 930803,
62 V50 = 19960307,
63 V60 = 19970606,
64 V70 = 19990903,
65 V110 = 20091201
66 };
67
68 Similar to the :doc:`PDB Stream `, this value always appears to be
69 ``V70``, and it is not clear what the other values are for.
70
71 - **Age** - The number of times the PDB has been written. Equal to the same
72 field from the :ref:`PDB Stream header `.
73
74 - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream `,
75 which contains CodeView symbol records for all global symbols. Actual records
76 are stored in the symbol record stream, and are referenced from this stream.
77
78 - **BuildNumber** - A bitfield containing values representing the major and minor
79 version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
80 program, with the following layout:
81
82 .. code-block:: c++
83
84 uint16_t MinorVersion : 8;
85 uint16_t MajorVersion : 7;
86 uint16_t NewVersionFormat : 1;
87
88 For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
89 If it is ``false``, the layout above does not apply and the reader should consult
90 the `Microsoft Source Code `__ for
91 further guidance.
92
93 - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream `,
94 which contains CodeView symbol records for all public symbols. Actual records
95 are stored in the symbol record stream, and are referenced from this stream.
96
97 - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
98 PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
99
100 - **SymRecordStream** - The stream containing all CodeView symbol records used
101 by the program. This is used for deduplication, so that many different
102 compilands can refer to the same symbols without having to include the full record
103 content inside of each module stream.
104
105 - **PdbDllRbld** - Unknown
106
107 - **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
108
109 - **Flags** - A bitfield with the following layout, containing various
110 information about how the program was built:
111
112 .. code-block:: c++
113
114 uint16_t WasIncrementallyLinked : 1;
115 uint16_t ArePrivateSymbolsStripped : 1;
116 uint16_t HasConflictingTypes : 1;
117 uint16_t Reserved : 13;
118
119 The only one of these that is not self-explanatory is ``HasConflictingTypes``.
120 Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
121 If it is passed to ``link.exe``, this field will be set. Otherwise it will
122 not be set. It is unclear what this flag does, although it seems to have
123 subtle implications on the algorithm used to look up type records.
124
125 - **Machine** - A value from the `CV_CPU_TYPE_e `__
126 enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
127
128 Immediately after the fixed-size DBI Stream header are ``7`` variable-length
129 `substreams`. The following ``7`` fields of the DBI Stream header specify the
130 number of bytes of the corresponding substream. Each substream's contents will
131 be described in detail :ref:`below `. The length of the entire
132 DBI Stream should equal ``64`` (the length of the header above) plus the value
133 of each of the following ``7`` fields.
134
135 - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
136
137 - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
138
139 - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
140
141 - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
142
143 - **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
144
145 - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
146
147 - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
148
149 .. _dbi_substreams:
150
151 Substreams
152 ==========
153
154 .. _dbi_mod_info_substream:
155
156 Module Info Substream
157 ^^^^^^^^^^^^^^^^^^^^^
158
159 Begins at offset ``0`` immediately after the :ref:`header `. The
160 module info substream is an array of variable-length records, each one
161 describing a single module (e.g. object file) linked into the program. Each
162 record in the array has the format:
163
164 .. code-block:: c++
165
166 struct SectionContribEntry {
167 uint16_t Section;
168 char Padding1[2];
169 int32_t Offset;
170 int32_t Size;
171 uint32_t Characteristics;
172 uint16_t ModuleIndex;
173 char Padding2[2];
174 uint32_t DataCrc;
175 uint32_t RelocCrc;
176 };
177
178 While most of these are self-explanatory, the ``Characteristics`` field
179 warrants some elaboration. It corresponds to the ``Characteristics``
180 field of the `IMAGE_SECTION_HEADER `__
181 structure.
182
183 .. code-block:: c++
184
185 struct ModInfo {
186 uint32_t Unused1;
187 SectionContribEntry SectionContr;
188 uint16_t Flags;
189 uint16_t ModuleSymStream;
190 uint32_t SymByteSize;
191 uint32_t C11ByteSize;
192 uint32_t C13ByteSize;
193 uint16_t SourceFileCount;
194 char Padding[2];
195 uint32_t Unused2;
196 uint32_t SourceFileNameIndex;
197 uint32_t PdbFilePathNameIndex;
198 char ModuleName[];
199 char ObjFileName[];
200 };
201
202 - **SectionContr** - Describes the properties of the section in the final binary
203 which contain the code and data from this module.
204
205 - **Flags** - A bitfield with the following format:
206
207 .. code-block:: c++
208
209 uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB.
210 uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is.
211 uint16_t Unused : 6;
212 uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM.
213
214
215 - **ModuleSymStream** - The index of the stream that contains symbol information
216 for this module. This includes CodeView symbol information as well as source
217 and line information.
218
219 - **SymByteSize** - The number of bytes of data from the stream identified by
220 ``ModuleSymStream`` that represent CodeView symbol records.
221
222 - **C11ByteSize** - The number of bytes of data from the stream identified by
223 ``ModuleSymStream`` that represent C11-style CodeView line information.
224
225 - **C13ByteSize** - The number of bytes of data from the stream identified by
226 ``ModuleSymStream`` that represent C13-style CodeView line information. At
227 most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
228
229 - **SourceFileCount** - The number of source files that contributed to this
230 module during compilation.
231
232 - **SourceFileNameIndex** - The offset in the names buffer of the primary
233 translation unit used to build this module. All PDB files observed to date
234 always have this value equal to 0.
235
236 - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
237 containing this module's symbol information. This has only been observed
238 to be non-zero for the special ``* Linker *`` module.
239
240 - **ModuleName** - The module name. This is usually either a full path to an
241 object file (either directly passed to ``link.exe`` or from an archive) or
242 a string of the form ``Import:``.
243
244 - **ObjFileName** - The object file name. In the case of an module that is
245 linked directly passed to ``link.exe``, this is the same as **ModuleName**.
246 In the case of a module that comes from an archive, this is usually the full
247 path to the archive.
248
249 .. _dbi_sec_contr_substream:
250
251 Section Contribution Substream
252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
253 Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
254 and consumes ``Header->SectionContributionSize`` bytes. This substream begins
255 with a single ``uint32_t`` which will be one of the following values:
256
257 .. code-block:: c++
258
259 enum class SectionContrSubstreamVersion : uint32_t {
260 Ver60 = 0xeffe0000 + 19970605,
261 V2 = 0xeffe0000 + 20140516
262 };
263
264 ``Ver60`` is the only value which has been observed in a PDB so far. Following
265 this ``4`` byte field is an array of fixed-length structures. If the version
266 is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the
267 version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
268 defined as follows:
269
270 .. code-block:: c++
271
272 struct SectionContribEntry2 {
273 SectionContribEntry SC;
274 uint32_t ISectCoff;
275 };
276
277 The purpose of the second field is not well understood.
278
279
280 .. _dbi_section_map_substream:
281
282 Section Map Substream
283 ^^^^^^^^^^^^^^^^^^^^^
284 Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
285 and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8``
286 byte header followed by an array of fixed-length records. The header and records
287 have the following layout:
288
289 .. code-block:: c++
290
291 struct SectionMapHeader {
292 uint16_t Count; // Number of segment descriptors
293 uint16_t LogCount; // Number of logical segment descriptors
294 };
295
296 struct SectionMapEntry {
297 uint16_t Flags; // See the SectionMapEntryFlags enum below.
298 uint16_t Ovl; // Logical overlay number
299 uint16_t Group; // Group index into descriptor array.
300 uint16_t Frame;
301 uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
302 uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
303 uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
304 uint32_t SectionLength; // Byte count of the segment or group.
305 };
306
307 enum class SectionMapEntryFlags : uint16_t {
308 Read = 1 << 0, // Segment is readable.
309 Write = 1 << 1, // Segment is writable.
310 Execute = 1 << 2, // Segment is executable.
311 AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
312 IsSelector = 1 << 8, // Frame represents a selector.
313 IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
314 IsGroup = 1 << 10 // If set, descriptor represents a group.
315 };
316
317 Many of these fields are not well understood, so will not be discussed further.
318
319 .. _dbi_file_info_substream:
320
321 File Info Substream
322 ^^^^^^^^^^^^^^^^^^^
323 Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
324 and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
325 from module to the source files that contribute to that module. Since multiple
326 modules can use the same source file (for example, a header file), this substream
327 uses a string table to store each unique file name only once, and then have each
328 module use offsets into the string table rather than embedding the string's value
329 directly. The format of this substream is as follows:
330
331 .. code-block:: c++
332
333 struct FileInfoSubstream {
334 uint16_t NumModules;
335 uint16_t NumSourceFiles;
336
337 uint16_t ModIndices[NumModules];
338 uint16_t ModFileCounts[NumModules];
339 uint32_t FileNameOffsets[NumSourceFiles];
340 char NamesBuffer[][NumSourceFiles];
341 };
342
343 **NumModules** - The number of modules for which source file information is
344 contained within this substream. Should match the corresponding value from the
345 ref:`dbi_header`.
346
347 **NumSourceFiles**: In theory this is supposed to contain the number of source
348 files for which this substream contains information. But that would present a
349 problem in that the width of this field being ``16``-bits would prevent one from
350 having more than 64K source files in a program. In early versions of the file
351 format, this seems to have been the case. In order to support more than this, this
352 field of the is simply ignored, and computed dynamically by summing up the values of
353 the ``ModFileCounts`` array (discussed below). In short, this value should be
354 ignored.
355
356 **ModIndices** - This array is present, but does not appear to be useful.
357
358 **ModFileCountArray** - An array of ``NumModules`` integers, each one containing
359 the number of source files which contribute to the module at the specified index.
360 While each individual module is limited to 64K contributing source files, the
361 union of all modules' source files may be greater than 64K. The real number of
362 source files is thus computed by summing this array. Note that summing this array
363 does not give the number of `unique` source files, only the total number of source
364 file contributions to modules.
365
366 **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
367 here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
368 each integer is an offset into **NamesBuffer** pointing to a null terminated string.
369
370 **NamesBuffer** - An array of null terminated strings containing the actual source
371 file names.
372
373 .. _dbi_type_server_substream:
374
375 Type Server Substream
376 ^^^^^^^^^^^^^^^^^^^^^
377 Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
378 and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout
379 of this substream is understood, although it is assumed to related somehow to the
380 usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further.
381
382 .. _dbi_ec_substream:
383
384 EC Substream
385 ^^^^^^^^^^^^
386 Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
387 and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout
388 of this substream is understood, and it will not be discussed further.
389
390 .. _dbi_optional_dbg_stream:
391
392 Optional Debug Header Stream
393 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
394 Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
395 consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
396 stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
397 index in the larger MSF file which contains some additional debug information.
398 Each position of this array has a special meaning, allowing one to determine
399 what kind of debug information is at the referenced stream. ``11`` indices
400 are currently understood, although it's possible there may be more. The
401 layout of each stream generally corresponds exactly to a particular type
402 of debug data directory from the PE/COFF file. The format of these fields
403 can be found in the `Microsoft PE/COFF Specification `__.
404
405 **FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a
406 debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
407
408 **Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
409 is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
410
411 **Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
412 debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
413
414 **Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
415 is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
416 is used for mapping addresses between instrumented and uninstrumented code.
417
418 **Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
419 is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
420 is used for mapping addresses between instrumented and uninstrumented code.
421
422 **Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
423 the original executable.
424
425 **Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
426 understood, but it is assumed to be a mapping from ``CLR Token`` to
427 ``CLR Record ID``. Refer to `ECMA 335 `__
428 for more information.
429
430 **Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
431 executable.
432
433 **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
434 section from the executable, but that would make it identical to
435 ``DbgStreamArray[1]``. The difference between these two indices is not well
436 understood.
437
438 **New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
439 debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this
440 differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
441 used the "new" format rather than the "old" format.
442
443 **Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar
444 to ``DbgStreamArray[5]``, but has not been observed in practice.
3535
3636 File Layout
3737 ===========
38
39 .. important::
40 Unless otherwise specified, all numeric values are encoded in little endian.
41 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
42 assume it is little endian!
3843
3944 .. toctree::
4045 :hidden: