llvm.org GIT mirror llvm / 52810e6
Convert PDB docs to unix line endings. No other changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359712 91177308-0d34-0410-b5e6-96231b3b80d8 Nico Weber 1 year, 5 months ago
7 changed file(s) with 848 addition(s) and 848 deletion(s). Raw diff Collapse all Expand all
None =====================================
1 The PDB Global Symbol Stream
2 =====================================
0 =====================================
1 The PDB Global Symbol Stream
2 =====================================
None The PDB Serialized Hash Table Format
1 ====================================
2
3 .. contents::
4 :local:
5
6 .. _hash_intro:
7
8 Introduction
9 ============
10
11 One of the design goals of the PDB format is to provide accelerated access to
12 debug information, and for this reason there are several occasions where hash
13 tables are serialized and embedded directly to the file, rather than requiring
14 a consumer to read a list of values and reconstruct the hash table on the fly.
15
16 The serialization format supports hash tables of arbitrarily large size and
17 capacity, as well as value types and hash functions. The only supported key
18 value type is a uint32. The only requirement is that the producer and consumer
19 agree on the hash function. As such, the hash function can is not discussed
20 further in this document, it is assumed that for a particular instance of a PDB
21 file hash table, the appropriate hash function is being used.
22
23 On-Disk Format
24 ==============
25
26 .. code-block:: none
27
28 .--------------------.-- +0
29 | Size |
30 .--------------------.-- +4
31 | Capacity |
32 .--------------------.-- +8
33 | Present Bit Vector |
34 .--------------------.-- +N
35 | Deleted Bit Vector |
36 .--------------------.-- +M ─╮
37 | Key | │
38 .--------------------.-- +M+4 │
39 | Value | │
40 .--------------------.-- +M+4+sizeof(Value) │
41 ... ├─ |Capacity| Bucket entries
42 .--------------------. │
43 | Key | │
44 .--------------------. │
45 | Value | │
46 .--------------------. ─╯
47
48 - **Size** - The number of values contained in the hash table.
49
50 - **Capacity** - The number of buckets in the hash table. Producers should
51 maintain a load factor of no greater than ``2/3*Capacity+1``.
52
53 - **Present Bit Vector** - A serialized bit vector which contains information
54 about which buckets have valid values. If the bucket has a value, the
55 corresponding bit will be set, and if the bucket doesn't have a value (either
56 because the bucket is empty or because the value is a tombstone value) the bit
57 will be unset.
58
59 - **Deleted Bit Vector** - A serialized bit vector which contains information
60 about which buckets have tombstone values. If the entry in this bucket is
61 deleted, the bit will be set, otherwise it will be unset.
62
63 - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
64 entry is the key (always a uint32), and the second entry is the value. The
65 state of each bucket (valid, empty, deleted) can be determined by examining
66 the present and deleted bit vectors.
67
68
69 .. _hash_bit_vectors:
70
71 Present and Deleted Bit Vectors
72 ===============================
73
74 The bit vectors indicating the status of each bucket are serialized as follows:
75
76 .. code-block:: none
77
78 .--------------------.-- +0
79 | Word Count |
80 .--------------------.-- +4
81 | Word_0 | ─╮
82 .--------------------.-- +8 │
83 | Word_1 | │
84 .--------------------.-- +12 ├─ |Word Count| values
85 ... │
86 .--------------------. │
87 | Word_N | │
88 .--------------------. ─╯
89
90 The words, when viewed as a contiguous block of bytes, represent a bit vector with
91 the following layout:
92
93 .. code-block:: none
94
95 .------------. .------------.------------.
96 | Word_N | ... | Word_1 | Word_0 |
97 .------------. .------------.------------.
98 | | | | |
99 +N*32 +(N-1)*32 +64 +32 +0
100
101 where the k'th bit of this bit vector represents the status of the k'th bucket
102 in the hash table.
0 The PDB Serialized Hash Table Format
1 ====================================
2
3 .. contents::
4 :local:
5
6 .. _hash_intro:
7
8 Introduction
9 ============
10
11 One of the design goals of the PDB format is to provide accelerated access to
12 debug information, and for this reason there are several occasions where hash
13 tables are serialized and embedded directly to the file, rather than requiring
14 a consumer to read a list of values and reconstruct the hash table on the fly.
15
16 The serialization format supports hash tables of arbitrarily large size and
17 capacity, as well as value types and hash functions. The only supported key
18 value type is a uint32. The only requirement is that the producer and consumer
19 agree on the hash function. As such, the hash function can is not discussed
20 further in this document, it is assumed that for a particular instance of a PDB
21 file hash table, the appropriate hash function is being used.
22
23 On-Disk Format
24 ==============
25
26 .. code-block:: none
27
28 .--------------------.-- +0
29 | Size |
30 .--------------------.-- +4
31 | Capacity |
32 .--------------------.-- +8
33 | Present Bit Vector |
34 .--------------------.-- +N
35 | Deleted Bit Vector |
36 .--------------------.-- +M ─╮
37 | Key | │
38 .--------------------.-- +M+4 │
39 | Value | │
40 .--------------------.-- +M+4+sizeof(Value) │
41 ... ├─ |Capacity| Bucket entries
42 .--------------------. │
43 | Key | │
44 .--------------------. │
45 | Value | │
46 .--------------------. ─╯
47
48 - **Size** - The number of values contained in the hash table.
49
50 - **Capacity** - The number of buckets in the hash table. Producers should
51 maintain a load factor of no greater than ``2/3*Capacity+1``.
52
53 - **Present Bit Vector** - A serialized bit vector which contains information
54 about which buckets have valid values. If the bucket has a value, the
55 corresponding bit will be set, and if the bucket doesn't have a value (either
56 because the bucket is empty or because the value is a tombstone value) the bit
57 will be unset.
58
59 - **Deleted Bit Vector** - A serialized bit vector which contains information
60 about which buckets have tombstone values. If the entry in this bucket is
61 deleted, the bit will be set, otherwise it will be unset.
62
63 - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
64 entry is the key (always a uint32), and the second entry is the value. The
65 state of each bucket (valid, empty, deleted) can be determined by examining
66 the present and deleted bit vectors.
67
68
69 .. _hash_bit_vectors:
70
71 Present and Deleted Bit Vectors
72 ===============================
73
74 The bit vectors indicating the status of each bucket are serialized as follows:
75
76 .. code-block:: none
77
78 .--------------------.-- +0
79 | Word Count |
80 .--------------------.-- +4
81 | Word_0 | ─╮
82 .--------------------.-- +8 │
83 | Word_1 | │
84 .--------------------.-- +12 ├─ |Word Count| values
85 ... │
86 .--------------------. │
87 | Word_N | │
88 .--------------------. ─╯
89
90 The words, when viewed as a contiguous block of bytes, represent a bit vector with
91 the following layout:
92
93 .. code-block:: none
94
95 .------------. .------------.------------.
96 | Word_N | ... | Word_1 | Word_0 |
97 .------------. .------------.------------.
98 | | | | |
99 +N*32 +(N-1)*32 +64 +32 +0
100
101 where the k'th bit of this bit vector represents the status of the k'th bucket
102 in the hash table.
None =====================================
1 The Module Information Stream
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _modi_stream_intro:
8
9 Introduction
10 ============
11
12 The Module Info Stream (henceforth referred to as the Modi stream) contains
13 information about a single module (object file, import library, etc that
14 contributes to the binary this PDB contains debug information about. There
15 is one modi stream for each module, and the mapping between modi stream index
16 and module is contained in the :doc:`DBI Stream `. The modi stream
17 for a single module contains line information for the compiland, as well as
18 all CodeView information for the symbols defined in the compiland. Finally,
19 there is a "global refs" substream which is not well understood.
20
21 .. _modi_stream_layout:
22
23 Stream Layout
24 =============
25
26 A modi stream is laid out as follows:
27
28
29 .. code-block:: c++
30
31 struct ModiStream {
32 uint32_t Signature;
33 uint8_t Symbols[SymbolSize-4];
34 uint8_t C11LineInfo[C11Size];
35 uint8_t C13LineInfo[C13Size];
36
37 uint32_t GlobalRefsSize;
38 uint8_t GlobalRefs[GlobalRefsSize];
39 };
40
41 - **Signature** - Unknown. In practice only the value of ``4`` has been
42 observed. It is hypothesized that this value corresponds to the set of
43 ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
44 meaning that this module has C13 line information (as opposed to C11 line
45 information). A corollary of this is that we expect to only ever see
46 C13 line info, and that we do not understand the format of C11 line info.
47
48 - **Symbols** - The :ref:`CodeView Symbol Substream `.
49 ``SymbolSize`` is equal to the value of ``SymByteSize`` for the
50 corresponding module's entry in the :ref:`Module Info Substream `
51 of the :doc:`DBI Stream `.
52
53 - **C11LineInfo** - A block containing CodeView line information in C11
54 format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
55 :ref:`Module Info Substream ` of the
56 :doc:`DBI Stream `. If this value is ``0``, then C11 line
57 information is not present. As mentioned previously, the format of
58 C11 line info is not understood and we assume all line in modern PDBs
59 to be in C13 format.
60
61 - **C13LineInfo** - A block containing CodeView line information in C13
62 format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
63 :ref:`Module Info Substream ` of the
64 :doc:`DBI Stream `. If this value is ``0``, then C13 line
65 information is not present.
66
67 - **GlobalRefs** - The meaning of this substream is not understood.
68
69 .. _modi_symbol_substream:
70
71 The CodeView Symbol Substream
72 =============================
73
74 The CodeView Symbol Substream. This is an array of variable length
75 records describing the functions, variables, inlining information,
76 and other symbols defined in the compiland. The entire array consumes
77 ``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
78 thusly, an array of CodeView Symbol Records) is described in
79 :doc:`CodeViewSymbols`.
0 =====================================
1 The Module Information Stream
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _modi_stream_intro:
8
9 Introduction
10 ============
11
12 The Module Info Stream (henceforth referred to as the Modi stream) contains
13 information about a single module (object file, import library, etc that
14 contributes to the binary this PDB contains debug information about. There
15 is one modi stream for each module, and the mapping between modi stream index
16 and module is contained in the :doc:`DBI Stream `. The modi stream
17 for a single module contains line information for the compiland, as well as
18 all CodeView information for the symbols defined in the compiland. Finally,
19 there is a "global refs" substream which is not well understood.
20
21 .. _modi_stream_layout:
22
23 Stream Layout
24 =============
25
26 A modi stream is laid out as follows:
27
28
29 .. code-block:: c++
30
31 struct ModiStream {
32 uint32_t Signature;
33 uint8_t Symbols[SymbolSize-4];
34 uint8_t C11LineInfo[C11Size];
35 uint8_t C13LineInfo[C13Size];
36
37 uint32_t GlobalRefsSize;
38 uint8_t GlobalRefs[GlobalRefsSize];
39 };
40
41 - **Signature** - Unknown. In practice only the value of ``4`` has been
42 observed. It is hypothesized that this value corresponds to the set of
43 ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
44 meaning that this module has C13 line information (as opposed to C11 line
45 information). A corollary of this is that we expect to only ever see
46 C13 line info, and that we do not understand the format of C11 line info.
47
48 - **Symbols** - The :ref:`CodeView Symbol Substream `.
49 ``SymbolSize`` is equal to the value of ``SymByteSize`` for the
50 corresponding module's entry in the :ref:`Module Info Substream `
51 of the :doc:`DBI Stream `.
52
53 - **C11LineInfo** - A block containing CodeView line information in C11
54 format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
55 :ref:`Module Info Substream ` of the
56 :doc:`DBI Stream `. If this value is ``0``, then C11 line
57 information is not present. As mentioned previously, the format of
58 C11 line info is not understood and we assume all line in modern PDBs
59 to be in C13 format.
60
61 - **C13LineInfo** - A block containing CodeView line information in C13
62 format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
63 :ref:`Module Info Substream ` of the
64 :doc:`DBI Stream `. If this value is ``0``, then C13 line
65 information is not present.
66
67 - **GlobalRefs** - The meaning of this substream is not understood.
68
69 .. _modi_symbol_substream:
70
71 The CodeView Symbol Substream
72 =============================
73
74 The CodeView Symbol Substream. This is an array of variable length
75 records describing the functions, variables, inlining information,
76 and other symbols defined in the compiland. The entire array consumes
77 ``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
78 thusly, an array of CodeView Symbol Records) is described in
79 :doc:`CodeViewSymbols`.
None =====================================
1 The MSF File Format
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _msf_layout:
8
9 File Layout
10 ===========
11
12 The MSF file format consists of the following components:
13
14 1. :ref:`msf_superblock`
15 2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
16 3. Data
17
18 Each component is stored as an indexed block, the length of which is specified
19 in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
20 following pattern (sometimes referred to as an "interval"):
21
22 1. 1 block of data
23 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
24 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
25 4. ``SuperBlock::BlockSize - 3`` blocks of data
26
27 In the first interval, the first data block is used to store
28 :ref:`msf_superblock`.
29
30 The following diagram demonstrates the general layout of the file (\| denotes
31 the end of an interval, and is for visualization purposes only):
32
33 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
34 | Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
35 +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
36 | Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
37 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
38
39 The file may end after any block, including immediately after a FPM1.
40
41 .. note::
42 LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
43 variant), so the rest of this document will assume a block size of 4096.
44
45 .. _msf_superblock:
46
47 The Superblock
48 ==============
49 At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
50 follows:
51
52 .. code-block:: c++
53
54 struct SuperBlock {
55 char FileMagic[sizeof(Magic)];
56 ulittle32_t BlockSize;
57 ulittle32_t FreeBlockMapBlock;
58 ulittle32_t NumBlocks;
59 ulittle32_t NumDirectoryBytes;
60 ulittle32_t Unknown;
61 ulittle32_t BlockMapAddr;
62 };
63
64 - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
65 followed by the bytes ``1A 44 53 00 00 00``.
66 - **BlockSize** - The block size of the internal file system. Valid values are
67 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
68 depending on the block sizes. For the purposes of LLVM, we handle only block
69 sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
70 - **FreeBlockMapBlock** - The index of a block within the file, at which begins
71 a bitfield representing the set of all blocks within the file which are "free"
72 (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
73 more information.
74 **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
75 - **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
76 should equal the size of the file on disk.
77 - **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
78 directory contains information about each stream's size and the set of blocks
79 that it occupies. It will be described in more detail later.
80 - **BlockMapAddr** - The index of a block within the MSF file. At this block is
81 an array of ``ulittle32_t``'s listing the blocks that the stream directory
82 resides on. For large MSF files, the stream directory (which describes the
83 block layout of each stream) may not fit entirely on a single block. As a
84 result, this extra layer of indirection is introduced, whereby this block
85 contains the list of blocks that the stream directory occupies, and the stream
86 directory itself can be stitched together accordingly. The number of
87 ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
88
89 .. _msf_freeblockmap:
90
91 The Free Block Map
92 ==================
93
94 The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
95 series of blocks which contains a bit flag for every block in the file. The
96 flag will be set to 0 if the block is in use, and 1 if the block is unused.
97
98 Each file contains two FPMs, one of which is active at any given time. This
99 feature is designed to support incremental and atomic updates of the underlying
100 MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
101 write your new modified bitfield to FPM2, and vice versa. Only when you commit
102 the file to disk do you need to swap the value in the SuperBlock to point to
103 the new ``FreeBlockMapBlock``.
104
105 The Free Block Maps are stored as a series of single blocks thoughout the file
106 at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
107 bytes, it contains 8 times as many bits as an interval has blocks. This means
108 that the first block of each FPM refers to the first 8 intervals of the file
109 (the first 32768 blocks), the second block of each FPM refers to the next 8
110 blocks, and so on. This results in far more FPM blocks being present than are
111 required, but in order to maintain backwards compatibility the format must stay
112 this way.
113
114 The Stream Directory
115 ====================
116 The Stream Directory is the root of all access to the other streams in an MSF
117 file. Beginning at byte 0 of the stream directory is the following structure:
118
119 .. code-block:: c++
120
121 struct StreamDirectory {
122 ulittle32_t NumStreams;
123 ulittle32_t StreamSizes[NumStreams];
124 ulittle32_t StreamBlocks[NumStreams][];
125 };
126
127 And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
128 Note that each of the last two arrays is of variable length, and in particular
129 that the second array is jagged.
130
131 **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
132 streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
133
134 Stream 0: ceil(1000 / 4096) = 1 block
135
136 Stream 1: ceil(8000 / 4096) = 2 blocks
137
138 Stream 2: ceil(16000 / 4096) = 4 blocks
139
140 Stream 3: ceil(9000 / 4096) = 3 blocks
141
142 In total, 10 blocks are used. Let's see what the stream directory might look
143 like:
144
145 .. code-block:: c++
146
147 struct StreamDirectory {
148 ulittle32_t NumStreams = 4;
149 ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
150 ulittle32_t StreamBlocks[][] = {
151 {4},
152 {5, 6},
153 {11, 9, 7, 8},
154 {10, 15, 12}
155 };
156 };
157
158 In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
159 would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
160 ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
161
162 Note also that the streams are discontiguous, and that part of stream 3 is in the
163 middle of part of stream 2. You cannot assume anything about the layout of the
164 blocks!
165
166 Alignment and Block Boundaries
167 ==============================
168 As may be clear by now, it is possible for a single field (whether it be a high
169 level record, a long string field, or even a single ``uint16``) to begin and
170 end in separate blocks. For example, if the block size is 4096 bytes, and a
171 ``uint16`` field begins at the last byte of the current block, then it would
172 need to end on the first byte of the next block. Since blocks are not
173 necessarily contiguously laid out in the file, this means that both the consumer
174 and the producer of an MSF file must be prepared to split data apart
175 accordingly. In the aforementioned example, the high byte of the ``uint16``
176 would be written to the last byte of block N, and the low byte would be written
177 to the first byte of block N+1, which could be tens of thousands of bytes later
178 (or even earlier!) in the file, depending on what the stream directory says.
0 =====================================
1 The MSF File Format
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _msf_layout:
8
9 File Layout
10 ===========
11
12 The MSF file format consists of the following components:
13
14 1. :ref:`msf_superblock`
15 2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
16 3. Data
17
18 Each component is stored as an indexed block, the length of which is specified
19 in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
20 following pattern (sometimes referred to as an "interval"):
21
22 1. 1 block of data
23 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
24 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
25 4. ``SuperBlock::BlockSize - 3`` blocks of data
26
27 In the first interval, the first data block is used to store
28 :ref:`msf_superblock`.
29
30 The following diagram demonstrates the general layout of the file (\| denotes
31 the end of an interval, and is for visualization purposes only):
32
33 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
34 | Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
35 +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
36 | Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
37 +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
38
39 The file may end after any block, including immediately after a FPM1.
40
41 .. note::
42 LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
43 variant), so the rest of this document will assume a block size of 4096.
44
45 .. _msf_superblock:
46
47 The Superblock
48 ==============
49 At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
50 follows:
51
52 .. code-block:: c++
53
54 struct SuperBlock {
55 char FileMagic[sizeof(Magic)];
56 ulittle32_t BlockSize;
57 ulittle32_t FreeBlockMapBlock;
58 ulittle32_t NumBlocks;
59 ulittle32_t NumDirectoryBytes;
60 ulittle32_t Unknown;
61 ulittle32_t BlockMapAddr;
62 };
63
64 - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
65 followed by the bytes ``1A 44 53 00 00 00``.
66 - **BlockSize** - The block size of the internal file system. Valid values are
67 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
68 depending on the block sizes. For the purposes of LLVM, we handle only block
69 sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
70 - **FreeBlockMapBlock** - The index of a block within the file, at which begins
71 a bitfield representing the set of all blocks within the file which are "free"
72 (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
73 more information.
74 **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
75 - **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
76 should equal the size of the file on disk.
77 - **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
78 directory contains information about each stream's size and the set of blocks
79 that it occupies. It will be described in more detail later.
80 - **BlockMapAddr** - The index of a block within the MSF file. At this block is
81 an array of ``ulittle32_t``'s listing the blocks that the stream directory
82 resides on. For large MSF files, the stream directory (which describes the
83 block layout of each stream) may not fit entirely on a single block. As a
84 result, this extra layer of indirection is introduced, whereby this block
85 contains the list of blocks that the stream directory occupies, and the stream
86 directory itself can be stitched together accordingly. The number of
87 ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
88
89 .. _msf_freeblockmap:
90
91 The Free Block Map
92 ==================
93
94 The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
95 series of blocks which contains a bit flag for every block in the file. The
96 flag will be set to 0 if the block is in use, and 1 if the block is unused.
97
98 Each file contains two FPMs, one of which is active at any given time. This
99 feature is designed to support incremental and atomic updates of the underlying
100 MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
101 write your new modified bitfield to FPM2, and vice versa. Only when you commit
102 the file to disk do you need to swap the value in the SuperBlock to point to
103 the new ``FreeBlockMapBlock``.
104
105 The Free Block Maps are stored as a series of single blocks thoughout the file
106 at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
107 bytes, it contains 8 times as many bits as an interval has blocks. This means
108 that the first block of each FPM refers to the first 8 intervals of the file
109 (the first 32768 blocks), the second block of each FPM refers to the next 8
110 blocks, and so on. This results in far more FPM blocks being present than are
111 required, but in order to maintain backwards compatibility the format must stay
112 this way.
113
114 The Stream Directory
115 ====================
116 The Stream Directory is the root of all access to the other streams in an MSF
117 file. Beginning at byte 0 of the stream directory is the following structure:
118
119 .. code-block:: c++
120
121 struct StreamDirectory {
122 ulittle32_t NumStreams;
123 ulittle32_t StreamSizes[NumStreams];
124 ulittle32_t StreamBlocks[NumStreams][];
125 };
126
127 And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
128 Note that each of the last two arrays is of variable length, and in particular
129 that the second array is jagged.
130
131 **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
132 streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
133
134 Stream 0: ceil(1000 / 4096) = 1 block
135
136 Stream 1: ceil(8000 / 4096) = 2 blocks
137
138 Stream 2: ceil(16000 / 4096) = 4 blocks
139
140 Stream 3: ceil(9000 / 4096) = 3 blocks
141
142 In total, 10 blocks are used. Let's see what the stream directory might look
143 like:
144
145 .. code-block:: c++
146
147 struct StreamDirectory {
148 ulittle32_t NumStreams = 4;
149 ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
150 ulittle32_t StreamBlocks[][] = {
151 {4},
152 {5, 6},
153 {11, 9, 7, 8},
154 {10, 15, 12}
155 };
156 };
157
158 In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
159 would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
160 ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
161
162 Note also that the streams are discontiguous, and that part of stream 3 is in the
163 middle of part of stream 2. You cannot assume anything about the layout of the
164 blocks!
165
166 Alignment and Block Boundaries
167 ==============================
168 As may be clear by now, it is possible for a single field (whether it be a high
169 level record, a long string field, or even a single ``uint16``) to begin and
170 end in separate blocks. For example, if the block size is 4096 bytes, and a
171 ``uint16`` field begins at the last byte of the current block, then it would
172 need to end on the first byte of the next block. Since blocks are not
173 necessarily contiguously laid out in the file, this means that both the consumer
174 and the producer of an MSF file must be prepared to split data apart
175 accordingly. In the aforementioned example, the high byte of the ``uint16``
176 would be written to the last byte of block N, and the low byte would be written
177 to the first byte of block N+1, which could be tens of thousands of bytes later
178 (or even earlier!) in the file, depending on what the stream directory says.
None =====================================
1 The PDB Public Symbol Stream
2 =====================================
0 =====================================
1 The PDB Public Symbol Stream
2 =====================================
None =====================================
1 The PDB TPI and IPI Streams
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _tpi_intro:
8
9 Introduction
10 ============
11
12 The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
13 all types used in the program. It is organized as a :ref:`header `
14 followed by a list of :doc:`CodeView Type Records `. Types are
15 referenced from various streams and records throughout the PDB by their
16 :ref:`type index `. In general, the sequence of type records
17 following the :ref:`header ` forms a topologically sorted DAG
18 (directed acyclic graph), which means that a type record B can only refer to
19 the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
20 this property will not hold (particularly when dealing with object files
21 compiled with MASM), an implementation should try very hard to make this
22 property hold, as it means the entire type graph can be constructed in a single
23 pass.
24
25 .. important::
26 Type records form a topologically sorted DAG (directed acyclic graph).
27
28 .. _tpi_ipi:
29
30 TPI vs IPI Stream
31 =================
32
33 Recent versions of the PDB format (aka all versions covered by this document)
34 have 2 streams with identical layout, henceforth referred to as the TPI stream
35 and IPI stream. Subsequent contents of this document describing the on-disk
36 format apply equally whether it is for the TPI Stream or the IPI Stream. The
37 only difference between the two is in *which* CodeView records are allowed to
38 appear in each one, summarized by the following table:
39
40 +----------------------+---------------------+
41 | TPI Stream | IPI Stream |
42 +======================+=====================+
43 | LF_POINTER | LF_FUNC_ID |
44 +----------------------+---------------------+
45 | LF_MODIFIER | LF_MFUNC_ID |
46 +----------------------+---------------------+
47 | LF_PROCEDURE | LF_BUILDINFO |
48 +----------------------+---------------------+
49 | LF_MFUNCTION | LF_SUBSTR_LIST |
50 +----------------------+---------------------+
51 | LF_LABEL | LF_STRING_ID |
52 +----------------------+---------------------+
53 | LF_ARGLIST | LF_UDT_SRC_LINE |
54 +----------------------+---------------------+
55 | LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
56 +----------------------+---------------------+
57 | LF_ARRAY | |
58 +----------------------+---------------------+
59 | LF_CLASS | |
60 +----------------------+---------------------+
61 | LF_STRUCTURE | |
62 +----------------------+---------------------+
63 | LF_INTERFACE | |
64 +----------------------+---------------------+
65 | LF_UNION | |
66 +----------------------+---------------------+
67 | LF_ENUM | |
68 +----------------------+---------------------+
69 | LF_TYPESERVER2 | |
70 +----------------------+---------------------+
71 | LF_VFTABLE | |
72 +----------------------+---------------------+
73 | LF_VTSHAPE | |
74 +----------------------+---------------------+
75 | LF_BITFIELD | |
76 +----------------------+---------------------+
77 | LF_METHODLIST | |
78 +----------------------+---------------------+
79 | LF_PRECOMP | |
80 +----------------------+---------------------+
81 | LF_ENDPRECOMP | |
82 +----------------------+---------------------+
83
84 The usage of these records is described in more detail in
85 :doc:`CodeView Type Records `.
86
87 .. _type_indices:
88
89 Type Indices
90 ============
91
92 A type index is a 32-bit integer that uniquely identifies a type inside of an
93 object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
94 value of the type index for the first type record from the TPI stream is given
95 by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header `
96 although in practice this value is always equal to 0x1000 (4096).
97
98 Any type index with a high bit set is considered to come from the IPI stream,
99 although this appears to be more of a hack, and LLVM does not generate type
100 indices of this nature. They can, however, be observed in Microsoft PDBs
101 occasionally, so one should be prepared to handle them. Note that having the
102 high bit set is not a necessary condition to determine whether a type index
103 comes from the IPI stream, it is only sufficient.
104
105 Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
106 to come from the appropriate stream, and any type index less than this is a
107 bitmask which can be decomposed as follows:
108
109 .. code-block:: none
110
111 .---------------------------.------.----------.
112 | Unused | Mode | Kind |
113 '---------------------------'------'----------'
114 |+32 |+12 |+8 |+0
115
116
117 - **Kind** - A value from the following enum:
118
119 .. code-block:: c++
120
121 enum class SimpleTypeKind : uint32_t {
122 None = 0x0000, // uncharacterized type (no type)
123 Void = 0x0003, // void
124 NotTranslated = 0x0007, // type not translated by cvpack
125 HResult = 0x0008, // OLE/COM HRESULT
126
127 SignedCharacter = 0x0010, // 8 bit signed
128 UnsignedCharacter = 0x0020, // 8 bit unsigned
129 NarrowCharacter = 0x0070, // really a char
130 WideCharacter = 0x0071, // wide char
131 Character16 = 0x007a, // char16_t
132 Character32 = 0x007b, // char32_t
133
134 SByte = 0x0068, // 8 bit signed int
135 Byte = 0x0069, // 8 bit unsigned int
136 Int16Short = 0x0011, // 16 bit signed
137 UInt16Short = 0x0021, // 16 bit unsigned
138 Int16 = 0x0072, // 16 bit signed int
139 UInt16 = 0x0073, // 16 bit unsigned int
140 Int32Long = 0x0012, // 32 bit signed
141 UInt32Long = 0x0022, // 32 bit unsigned
142 Int32 = 0x0074, // 32 bit signed int
143 UInt32 = 0x0075, // 32 bit unsigned int
144 Int64Quad = 0x0013, // 64 bit signed
145 UInt64Quad = 0x0023, // 64 bit unsigned
146 Int64 = 0x0076, // 64 bit signed int
147 UInt64 = 0x0077, // 64 bit unsigned int
148 Int128Oct = 0x0014, // 128 bit signed int
149 UInt128Oct = 0x0024, // 128 bit unsigned int
150 Int128 = 0x0078, // 128 bit signed int
151 UInt128 = 0x0079, // 128 bit unsigned int
152
153 Float16 = 0x0046, // 16 bit real
154 Float32 = 0x0040, // 32 bit real
155 Float32PartialPrecision = 0x0045, // 32 bit PP real
156 Float48 = 0x0044, // 48 bit real
157 Float64 = 0x0041, // 64 bit real
158 Float80 = 0x0042, // 80 bit real
159 Float128 = 0x0043, // 128 bit real
160
161 Complex16 = 0x0056, // 16 bit complex
162 Complex32 = 0x0050, // 32 bit complex
163 Complex32PartialPrecision = 0x0055, // 32 bit PP complex
164 Complex48 = 0x0054, // 48 bit complex
165 Complex64 = 0x0051, // 64 bit complex
166 Complex80 = 0x0052, // 80 bit complex
167 Complex128 = 0x0053, // 128 bit complex
168
169 Boolean8 = 0x0030, // 8 bit boolean
170 Boolean16 = 0x0031, // 16 bit boolean
171 Boolean32 = 0x0032, // 32 bit boolean
172 Boolean64 = 0x0033, // 64 bit boolean
173 Boolean128 = 0x0034, // 128 bit boolean
174 };
175
176 - **Mode** - A value from the following enum:
177
178 .. code-block:: c++
179
180 enum class SimpleTypeMode : uint32_t {
181 Direct = 0, // Not a pointer
182 NearPointer = 1, // Near pointer
183 FarPointer = 2, // Far pointer
184 HugePointer = 3, // Huge pointer
185 NearPointer32 = 4, // 32 bit near pointer
186 FarPointer32 = 5, // 32 bit far pointer
187 NearPointer64 = 6, // 64 bit near pointer
188 NearPointer128 = 7 // 128 bit near pointer
189 };
190
191 Note that for pointers, the bitness is represented in the mode. So a ``void*``
192 would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
193 but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
194
195 By convention, the type index for ``std::nullptr_t`` is constructed the same way
196 as the type index for ``void*``, but using the bitless enumeration value
197 ``NearPointer``.
198
199
200
201 .. _tpi_header:
202
203 Stream Header
204 =============
205 At offset 0 of the TPI Stream is a header with the following layout:
206
207
208 .. code-block:: c++
209
210 struct TpiStreamHeader {
211 uint32_t Version;
212 uint32_t HeaderSize;
213 uint32_t TypeIndexBegin;
214 uint32_t TypeIndexEnd;
215 uint32_t TypeRecordBytes;
216
217 uint16_t HashStreamIndex;
218 uint16_t HashAuxStreamIndex;
219 uint32_t HashKeySize;
220 uint32_t NumHashBuckets;
221
222 int32_t HashValueBufferOffset;
223 uint32_t HashValueBufferLength;
224
225 int32_t IndexOffsetBufferOffset;
226 uint32_t IndexOffsetBufferLength;
227
228 int32_t HashAdjBufferOffset;
229 uint32_t HashAdjBufferLength;
230 };
231
232 - **Version** - A value from the following enum.
233
234 .. code-block:: c++
235
236 enum class TpiStreamVersion : uint32_t {
237 V40 = 19950410,
238 V41 = 19951122,
239 V50 = 19961031,
240 V70 = 19990903,
241 V80 = 20040203,
242 };
243
244 Similar to the :doc:`PDB Stream `, this value always appears to be
245 ``V80``, and no other values have been observed. It is assumed that should
246 another value be observed, the layout described by this document may not be
247 accurate.
248
249 - **HeaderSize** - ``sizeof(TpiStreamHeader)``
250
251 - **TypeIndexBegin** - The numeric value of the type index representing the
252 first type record in the TPI stream. This is usually the value 0x1000 as type
253 indices lower than this are reserved (see :ref:`Type Indices ` for
254 a discussion of reserved type indices).
255
256 - **TypeIndexEnd** - One greater than the numeric value of the type index
257 representing the last type record in the TPI stream. The total number of type
258 records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
259
260 - **TypeRecordBytes** - The number of bytes of type record data following the header.
261
262 - **HashStreamIndex** - The index of a stream which contains a list of hashes for
263 every type record. This value may be -1, indicating that hash information is not
264 present. In practice a valid stream index is always observed, so any producer
265 implementation should be prepared to emit this stream to ensure compatibility with
266 tools which may expect it to be present.
267
268 - **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
269 hash table, although this has not been observed in practice and it's unclear what it
270 might be used for.
271
272 - **HashKeySize** - The size of a hash value (usually 4 bytes).
273
274 - **NumHashBuckets** - The number of buckets used to generate the hash values in the
275 aforementioned hash streams.
276
277 - **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
278 the TPI Hash Stream of the list of hash values. It should be assumed that there
279 are either 0 hash values, or a number equal to the number of type records in the
280 TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
281 not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
282 PDB malformed.
283
284 - **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
285 within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
286 pairs of uint32_t's where the first value is a :ref:`Type Index `
287 and the second value is the offset in the type record data of the type with this
288 index. This can be used to do a binary search followed bin a linear search to
289 get amortized O(log n) lookup by type index.
290
291 - **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
292 the TPI hash stream of a serialized hash table whose keys are the hash values
293 in the hash value buffer and whose values are type indices. This appears to
294 be useful in incremental linking scenarios, so that if a type is modified an
295 entry can be created mapping the old hash value to the new type index so that
296 a PDB file consumer can always have the most up to date version of the type
297 without forcing the incremental linker to garbage collect and update
298 references that point to the old version to now point to the new version.
299 The layout of this hash table is described in :doc:`HashTable`.
300
301 .. _tpi_records:
302
303 CodeView Type Record List
304 =========================
305 Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
306 variable length array of :doc:`CodeView type records `. The number
307 of such records (e.g. the length of the array) can be determined by computing the
308 value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
309
310 log(n) random access is provided by way of the Type Index Offsets array (if present)
311 described previously.
0 =====================================
1 The PDB TPI and IPI Streams
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _tpi_intro:
8
9 Introduction
10 ============
11
12 The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
13 all types used in the program. It is organized as a :ref:`header `
14 followed by a list of :doc:`CodeView Type Records `. Types are
15 referenced from various streams and records throughout the PDB by their
16 :ref:`type index `. In general, the sequence of type records
17 following the :ref:`header ` forms a topologically sorted DAG
18 (directed acyclic graph), which means that a type record B can only refer to
19 the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
20 this property will not hold (particularly when dealing with object files
21 compiled with MASM), an implementation should try very hard to make this
22 property hold, as it means the entire type graph can be constructed in a single
23 pass.
24
25 .. important::
26 Type records form a topologically sorted DAG (directed acyclic graph).
27
28 .. _tpi_ipi:
29
30 TPI vs IPI Stream
31 =================
32
33 Recent versions of the PDB format (aka all versions covered by this document)
34 have 2 streams with identical layout, henceforth referred to as the TPI stream
35 and IPI stream. Subsequent contents of this document describing the on-disk
36 format apply equally whether it is for the TPI Stream or the IPI Stream. The
37 only difference between the two is in *which* CodeView records are allowed to
38 appear in each one, summarized by the following table:
39
40 +----------------------+---------------------+
41 | TPI Stream | IPI Stream |
42 +======================+=====================+
43 | LF_POINTER | LF_FUNC_ID |
44 +----------------------+---------------------+
45 | LF_MODIFIER | LF_MFUNC_ID |
46 +----------------------+---------------------+
47 | LF_PROCEDURE | LF_BUILDINFO |
48 +----------------------+---------------------+
49 | LF_MFUNCTION | LF_SUBSTR_LIST |
50 +----------------------+---------------------+
51 | LF_LABEL | LF_STRING_ID |
52 +----------------------+---------------------+
53 | LF_ARGLIST | LF_UDT_SRC_LINE |
54 +----------------------+---------------------+
55 | LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
56 +----------------------+---------------------+
57 | LF_ARRAY | |
58 +----------------------+---------------------+
59 | LF_CLASS | |
60 +----------------------+---------------------+
61 | LF_STRUCTURE | |
62 +----------------------+---------------------+
63 | LF_INTERFACE | |
64 +----------------------+---------------------+
65 | LF_UNION | |
66 +----------------------+---------------------+
67 | LF_ENUM | |
68 +----------------------+---------------------+
69 | LF_TYPESERVER2 | |
70 +----------------------+---------------------+
71 | LF_VFTABLE | |
72 +----------------------+---------------------+
73 | LF_VTSHAPE | |
74 +----------------------+---------------------+
75 | LF_BITFIELD | |
76 +----------------------+---------------------+
77 | LF_METHODLIST | |
78 +----------------------+---------------------+
79 | LF_PRECOMP | |
80 +----------------------+---------------------+
81 | LF_ENDPRECOMP | |
82 +----------------------+---------------------+
83
84 The usage of these records is described in more detail in
85 :doc:`CodeView Type Records `.
86
87 .. _type_indices:
88
89 Type Indices
90 ============
91
92 A type index is a 32-bit integer that uniquely identifies a type inside of an
93 object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
94 value of the type index for the first type record from the TPI stream is given
95 by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header `
96 although in practice this value is always equal to 0x1000 (4096).
97
98 Any type index with a high bit set is considered to come from the IPI stream,
99 although this appears to be more of a hack, and LLVM does not generate type
100 indices of this nature. They can, however, be observed in Microsoft PDBs
101 occasionally, so one should be prepared to handle them. Note that having the
102 high bit set is not a necessary condition to determine whether a type index
103 comes from the IPI stream, it is only sufficient.
104
105 Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
106 to come from the appropriate stream, and any type index less than this is a
107 bitmask which can be decomposed as follows:
108
109 .. code-block:: none
110
111 .---------------------------.------.----------.
112 | Unused | Mode | Kind |
113 '---------------------------'------'----------'
114 |+32 |+12 |+8 |+0
115
116
117 - **Kind** - A value from the following enum:
118
119 .. code-block:: c++
120
121 enum class SimpleTypeKind : uint32_t {
122 None = 0x0000, // uncharacterized type (no type)
123 Void = 0x0003, // void
124 NotTranslated = 0x0007, // type not translated by cvpack
125 HResult = 0x0008, // OLE/COM HRESULT
126
127 SignedCharacter = 0x0010, // 8 bit signed
128 UnsignedCharacter = 0x0020, // 8 bit unsigned
129 NarrowCharacter = 0x0070, // really a char
130 WideCharacter = 0x0071, // wide char
131 Character16 = 0x007a, // char16_t
132 Character32 = 0x007b, // char32_t
133
134 SByte = 0x0068, // 8 bit signed int
135 Byte = 0x0069, // 8 bit unsigned int
136 Int16Short = 0x0011, // 16 bit signed
137 UInt16Short = 0x0021, // 16 bit unsigned
138 Int16 = 0x0072, // 16 bit signed int
139 UInt16 = 0x0073, // 16 bit unsigned int
140 Int32Long = 0x0012, // 32 bit signed
141 UInt32Long = 0x0022, // 32 bit unsigned
142 Int32 = 0x0074, // 32 bit signed int
143 UInt32 = 0x0075, // 32 bit unsigned int
144 Int64Quad = 0x0013, // 64 bit signed
145 UInt64Quad = 0x0023, // 64 bit unsigned
146 Int64 = 0x0076, // 64 bit signed int
147 UInt64 = 0x0077, // 64 bit unsigned int
148 Int128Oct = 0x0014, // 128 bit signed int
149 UInt128Oct = 0x0024, // 128 bit unsigned int
150 Int128 = 0x0078, // 128 bit signed int
151 UInt128 = 0x0079, // 128 bit unsigned int
152
153 Float16 = 0x0046, // 16 bit real
154 Float32 = 0x0040, // 32 bit real
155 Float32PartialPrecision = 0x0045, // 32 bit PP real
156 Float48 = 0x0044, // 48 bit real
157 Float64 = 0x0041, // 64 bit real
158 Float80 = 0x0042, // 80 bit real
159 Float128 = 0x0043, // 128 bit real
160
161 Complex16 = 0x0056, // 16 bit complex
162 Complex32 = 0x0050, // 32 bit complex
163 Complex32PartialPrecision = 0x0055, // 32 bit PP complex
164 Complex48 = 0x0054, // 48 bit complex
165 Complex64 = 0x0051, // 64 bit complex
166 Complex80 = 0x0052, // 80 bit complex
167 Complex128 = 0x0053, // 128 bit complex
168
169 Boolean8 = 0x0030, // 8 bit boolean
170 Boolean16 = 0x0031, // 16 bit boolean
171 Boolean32 = 0x0032, // 32 bit boolean
172 Boolean64 = 0x0033, // 64 bit boolean
173 Boolean128 = 0x0034, // 128 bit boolean
174 };
175
176 - **Mode** - A value from the following enum:
177
178 .. code-block:: c++
179
180 enum class SimpleTypeMode : uint32_t {
181 Direct = 0, // Not a pointer
182 NearPointer = 1, // Near pointer
183 FarPointer = 2, // Far pointer
184 HugePointer = 3, // Huge pointer
185 NearPointer32 = 4, // 32 bit near pointer
186 FarPointer32 = 5, // 32 bit far pointer
187 NearPointer64 = 6, // 64 bit near pointer
188 NearPointer128 = 7 // 128 bit near pointer
189 };
190
191 Note that for pointers, the bitness is represented in the mode. So a ``void*``
192 would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
193 but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
194
195 By convention, the type index for ``std::nullptr_t`` is constructed the same way
196 as the type index for ``void*``, but using the bitless enumeration value
197 ``NearPointer``.
198
199
200
201 .. _tpi_header:
202
203 Stream Header
204 =============
205 At offset 0 of the TPI Stream is a header with the following layout:
206
207
208 .. code-block:: c++
209
210 struct TpiStreamHeader {
211 uint32_t Version;
212 uint32_t HeaderSize;
213 uint32_t TypeIndexBegin;
214 uint32_t TypeIndexEnd;
215 uint32_t TypeRecordBytes;
216
217 uint16_t HashStreamIndex;
218 uint16_t HashAuxStreamIndex;
219 uint32_t HashKeySize;
220 uint32_t NumHashBuckets;
221
222 int32_t HashValueBufferOffset;
223 uint32_t HashValueBufferLength;
224
225 int32_t IndexOffsetBufferOffset;
226 uint32_t IndexOffsetBufferLength;
227
228 int32_t HashAdjBufferOffset;
229 uint32_t HashAdjBufferLength;
230 };
231
232 - **Version** - A value from the following enum.
233
234 .. code-block:: c++
235
236 enum class TpiStreamVersion : uint32_t {
237 V40 = 19950410,
238 V41 = 19951122,
239 V50 = 19961031,
240 V70 = 19990903,
241 V80 = 20040203,
242 };
243
244 Similar to the :doc:`PDB Stream `, this value always appears to be
245 ``V80``, and no other values have been observed. It is assumed that should
246 another value be observed, the layout described by this document may not be
247 accurate.
248
249 - **HeaderSize** - ``sizeof(TpiStreamHeader)``
250
251 - **TypeIndexBegin** - The numeric value of the type index representing the
252 first type record in the TPI stream. This is usually the value 0x1000 as type
253 indices lower than this are reserved (see :ref:`Type Indices ` for
254 a discussion of reserved type indices).
255
256 - **TypeIndexEnd** - One greater than the numeric value of the type index
257 representing the last type record in the TPI stream. The total number of type
258 records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
259
260 - **TypeRecordBytes** - The number of bytes of type record data following the header.
261
262 - **HashStreamIndex** - The index of a stream which contains a list of hashes for
263 every type record. This value may be -1, indicating that hash information is not
264 present. In practice a valid stream index is always observed, so any producer
265 implementation should be prepared to emit this stream to ensure compatibility with
266 tools which may expect it to be present.
267
268 - **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
269 hash table, although this has not been observed in practice and it's unclear what it
270 might be used for.
271
272 - **HashKeySize** - The size of a hash value (usually 4 bytes).
273
274 - **NumHashBuckets** - The number of buckets used to generate the hash values in the
275 aforementioned hash streams.
276
277 - **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
278 the TPI Hash Stream of the list of hash values. It should be assumed that there
279 are either 0 hash values, or a number equal to the number of type records in the
280 TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
281 not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
282 PDB malformed.
283
284 - **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
285 within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
286 pairs of uint32_t's where the first value is a :ref:`Type Index `
287 and the second value is the offset in the type record data of the type with this
288 index. This can be used to do a binary search followed bin a linear search to
289 get amortized O(log n) lookup by type index.
290
291 - **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
292 the TPI hash stream of a serialized hash table whose keys are the hash values
293 in the hash value buffer and whose values are type indices. This appears to
294 be useful in incremental linking scenarios, so that if a type is modified an
295 entry can be created mapping the old hash value to the new type index so that
296 a PDB file consumer can always have the most up to date version of the type
297 without forcing the incremental linker to garbage collect and update
298 references that point to the old version to now point to the new version.
299 The layout of this hash table is described in :doc:`HashTable`.
300
301 .. _tpi_records:
302
303 CodeView Type Record List
304 =========================
305 Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
306 variable length array of :doc:`CodeView type records `. The number
307 of such records (e.g. the length of the array) can be determined by computing the
308 value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
309
310 log(n) random access is provided by way of the Type Index Offsets array (if present)
311 described previously.
None =====================================
1 The PDB File Format
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _pdb_intro:
8
9 Introduction
10 ============
11
12 PDB (Program Database) is a file format invented by Microsoft and which contains
13 debug information that can be consumed by debuggers and other tools. Since
14 officially supported APIs exist on Windows for querying debug information from
15 PDBs even without the user understanding the internals of the file format, a
16 large ecosystem of tools has been built for Windows to consume this format. In
17 order for Clang to be able to generate programs that can interoperate with these
18 tools, it is necessary for us to generate PDB files ourselves.
19
20 At the same time, LLVM has a long history of being able to cross-compile from
21 any platform to any platform, and we wish for the same to be true here. So it
22 is necessary for us to understand the PDB file format at the byte-level so that
23 we can generate PDB files entirely on our own.
24
25 This manual describes what we know about the PDB file format today. The layout
26 of the file, the various streams contained within, the format of individual
27 records within, and more.
28
29 We would like to extend our heartfelt gratitude to Microsoft, without whom we
30 would not be where we are today. Much of the knowledge contained within this
31 manual was learned through reading code published by Microsoft on their `GitHub
32 repo `__.
33
34 .. _pdb_layout:
35
36 File Layout
37 ===========
38
39 .. important::
40 Unless otherwise specified, all numeric values are encoded in little endian.
41 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
42 assume it is little endian!
43
44 .. toctree::
45 :hidden:
46
47 MsfFile
48 PdbStream
49 TpiStream
50 DbiStream
51 ModiStream
52 PublicStream
53 GlobalStream
54 HashTable
55 CodeViewSymbols
56 CodeViewTypes
57
58 .. _msf:
59
60 The MSF Container
61 -----------------
62 A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
63 An MSF file is actually a miniature "file system within a file". It contains
64 multiple streams (aka files) which can represent arbitrary data, and these
65 streams are divided into blocks which may not necessarily be contiguously
66 laid out within the file (aka fragmented). Additionally, the MSF contains a
67 stream directory (aka MFT) which describes how the streams (files) are laid
68 out within the MSF.
69
70 For more information about the MSF container format, stream directory, and
71 block layout, see :doc:`MsfFile`.
72
73 .. _streams:
74
75 Streams
76 -------
77 The PDB format contains a number of streams which describe various information
78 such as the types, symbols, source files, and compilands (e.g. object files)
79 of a program, as well as some additional streams containing hash tables that are
80 used by debuggers and other tools to provide fast lookup of records and types
81 by name, and various other information about how the program was compiled such
82 as the specific toolchain used, and more. A summary of streams contained in a
83 PDB file is as follows:
84
85 +--------------------+------------------------------+-------------------------------------------+
86 | Name | Stream Index | Contents |
87 +====================+==============================+===========================================+
88 | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
89 +--------------------+------------------------------+-------------------------------------------+
90 | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
91 | | | - Fields to match EXE to this PDB |
92 | | | - Map of named streams to stream indices |
93 +--------------------+------------------------------+-------------------------------------------+
94 | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
95 | | | - Index of TPI Hash Stream |
96 +--------------------+------------------------------+-------------------------------------------+
97 | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
98 | | | - Indices of individual module streams |
99 | | | - Indices of public / global streams |
100 | | | - Section Contribution Information |
101 | | | - Source File Information |
102 | | | - References to streams containing |
103 | | | FPO / PGO Data |
104 +--------------------+------------------------------+-------------------------------------------+
105 | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
106 | | | - Index of IPI Hash Stream |
107 +--------------------+------------------------------+-------------------------------------------+
108 | /LinkInfo | - Contained in PDB Stream | - Unknown |
109 | | Named Stream map | |
110 +--------------------+------------------------------+-------------------------------------------+
111 | /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
112 | | Named Stream map | (e.g. natvis files) |
113 +--------------------+------------------------------+-------------------------------------------+
114 | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
115 | | Named Stream map | string de-duplication |
116 +--------------------+------------------------------+-------------------------------------------+
117 | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
118 | | - One for each compiland | - Line Number Information |
119 +--------------------+------------------------------+-------------------------------------------+
120 | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
121 | | | - Index of Public Hash Stream |
122 +--------------------+------------------------------+-------------------------------------------+
123 | Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
124 | | | - Index of Global Hash Stream |
125 +--------------------+------------------------------+-------------------------------------------+
126 | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
127 | | | by name |
128 +--------------------+------------------------------+-------------------------------------------+
129 | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
130 | | | by name |
131 +--------------------+------------------------------+-------------------------------------------+
132
133 More information about the structure of each of these can be found on the
134 following pages:
135
136 :doc:`PdbStream`
137 Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
138
139 :doc:`TpiStream`
140 Information about the TPI stream and the CodeView records contained within.
141
142 :doc:`DbiStream`
143 Information about the DBI stream and relevant substreams including the Module Substreams,
144 source file information, and CodeView symbol records contained within.
145
146 :doc:`ModiStream`
147 Information about the Module Information Stream, of which there is one for each compilation
148 unit and the format of symbols contained within.
149
150 :doc:`PublicStream`
151 Information about the Public Symbol Stream.
152
153 :doc:`GlobalStream`
154 Information about the Global Symbol Stream.
155
156 :doc:`HashTable`
157 Information about the serialized hash table format used internally to represent things such
158 as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream `.
159
160 CodeView
161 ========
162 CodeView is another format which comes into the picture. While MSF defines
163 the structure of the overall file, and PDB defines the set of streams that
164 appear within the MSF file and the format of those streams, CodeView defines
165 the format of **symbol and type records** that appear within specific streams.
166 Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
167 more information about the CodeView format.
0 =====================================
1 The PDB File Format
2 =====================================
3
4 .. contents::
5 :local:
6
7 .. _pdb_intro:
8
9 Introduction
10 ============
11
12 PDB (Program Database) is a file format invented by Microsoft and which contains
13 debug information that can be consumed by debuggers and other tools. Since
14 officially supported APIs exist on Windows for querying debug information from
15 PDBs even without the user understanding the internals of the file format, a
16 large ecosystem of tools has been built for Windows to consume this format. In
17 order for Clang to be able to generate programs that can interoperate with these
18 tools, it is necessary for us to generate PDB files ourselves.
19
20 At the same time, LLVM has a long history of being able to cross-compile from
21 any platform to any platform, and we wish for the same to be true here. So it
22 is necessary for us to understand the PDB file format at the byte-level so that
23 we can generate PDB files entirely on our own.
24
25 This manual describes what we know about the PDB file format today. The layout
26 of the file, the various streams contained within, the format of individual
27 records within, and more.
28
29 We would like to extend our heartfelt gratitude to Microsoft, without whom we
30 would not be where we are today. Much of the knowledge contained within this
31 manual was learned through reading code published by Microsoft on their `GitHub
32 repo `__.
33
34 .. _pdb_layout:
35
36 File Layout
37 ===========
38
39 .. important::
40 Unless otherwise specified, all numeric values are encoded in little endian.
41 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
42 assume it is little endian!
43
44 .. toctree::
45 :hidden:
46
47 MsfFile
48 PdbStream
49 TpiStream
50 DbiStream
51 ModiStream
52 PublicStream
53 GlobalStream
54 HashTable
55 CodeViewSymbols
56 CodeViewTypes
57
58 .. _msf:
59
60 The MSF Container
61 -----------------
62 A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
63 An MSF file is actually a miniature "file system within a file". It contains
64 multiple streams (aka files) which can represent arbitrary data, and these
65 streams are divided into blocks which may not necessarily be contiguously
66 laid out within the file (aka fragmented). Additionally, the MSF contains a
67 stream directory (aka MFT) which describes how the streams (files) are laid
68 out within the MSF.
69
70 For more information about the MSF container format, stream directory, and
71 block layout, see :doc:`MsfFile`.
72
73 .. _streams:
74
75 Streams
76 -------
77 The PDB format contains a number of streams which describe various information
78 such as the types, symbols, source files, and compilands (e.g. object files)
79 of a program, as well as some additional streams containing hash tables that are
80 used by debuggers and other tools to provide fast lookup of records and types
81 by name, and various other information about how the program was compiled such
82 as the specific toolchain used, and more. A summary of streams contained in a
83 PDB file is as follows:
84
85 +--------------------+------------------------------+-------------------------------------------+
86 | Name | Stream Index | Contents |
87 +====================+==============================+===========================================+
88 | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
89 +--------------------+------------------------------+-------------------------------------------+
90 | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
91 | | | - Fields to match EXE to this PDB |
92 | | | - Map of named streams to stream indices |
93 +--------------------+------------------------------+-------------------------------------------+
94 | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
95 | | | - Index of TPI Hash Stream |
96 +--------------------+------------------------------+-------------------------------------------+
97 | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
98 | | | - Indices of individual module streams |
99 | | | - Indices of public / global streams |
100 | | | - Section Contribution Information |
101 | | | - Source File Information |
102 | | | - References to streams containing |
103 | | | FPO / PGO Data |
104 +--------------------+------------------------------+-------------------------------------------+
105 | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
106 | | | - Index of IPI Hash Stream |
107 +--------------------+------------------------------+-------------------------------------------+
108 | /LinkInfo | - Contained in PDB Stream | - Unknown |
109 | | Named Stream map | |
110 +--------------------+------------------------------+-------------------------------------------+
111 | /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
112 | | Named Stream map | (e.g. natvis files) |
113 +--------------------+------------------------------+-------------------------------------------+
114 | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
115 | | Named Stream map | string de-duplication |
116 +--------------------+------------------------------+-------------------------------------------+
117 | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
118 | | - One for each compiland | - Line Number Information |
119 +--------------------+------------------------------+-------------------------------------------+
120 | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
121 | | | - Index of Public Hash Stream |
122 +--------------------+------------------------------+-------------------------------------------+
123 | Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
124 | | | - Index of Global Hash Stream |
125 +--------------------+------------------------------+-------------------------------------------+
126 | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
127 | | | by name |
128 +--------------------+------------------------------+-------------------------------------------+
129 | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
130 | | | by name |
131 +--------------------+------------------------------+-------------------------------------------+
132
133 More information about the structure of each of these can be found on the
134 following pages:
135
136 :doc:`PdbStream`
137 Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
138
139 :doc:`TpiStream`
140 Information about the TPI stream and the CodeView records contained within.
141
142 :doc:`DbiStream`
143 Information about the DBI stream and relevant substreams including the Module Substreams,
144 source file information, and CodeView symbol records contained within.
145
146 :doc:`ModiStream`
147 Information about the Module Information Stream, of which there is one for each compilation
148 unit and the format of symbols contained within.
149
150 :doc:`PublicStream`
151 Information about the Public Symbol Stream.
152
153 :doc:`GlobalStream`
154 Information about the Global Symbol Stream.
155
156 :doc:`HashTable`
157 Information about the serialized hash table format used internally to represent things such
158 as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream `.
159
160 CodeView
161 ========
162 CodeView is another format which comes into the picture. While MSF defines
163 the structure of the overall file, and PDB defines the set of streams that
164 appear within the MSF file and the format of those streams, CodeView defines
165 the format of **symbol and type records** that appear within specific streams.
166 Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
167 more information about the CodeView format.