llvm.org GIT mirror llvm / 0d8c2db
Add a new document describing the LLVM accurate garbage collection support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13667 91177308-0d34-0410-b5e6-96231b3b80d8 Chris Lattner 15 years ago
1 changed file(s) with 418 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0
1 "http://www.w3.org/TR/html4/strict.dtd">
2
3
4 Accurate Garbage Collection with LLVM
5
6
7
8
9
10 Accurate Garbage Collection with LLVM
11
12
13
14
  • Introduction
  • 15
    16
  • GC features provided and algorithms supported
  • 17
    18
    19
    20
  • Interfaces for user programs
  • 21
    22
  • Identifying GC roots on the stack: llvm.gcroot
  • 23
  • GC descriptor format for heap objects
  • 24
  • Allocating memory from the GC
  • 25
  • Reading and writing references to the heap
  • 26
  • Explicit invocation of the garbage collector
  • 27
    28
    29
    30
  • Implementing a garbage collector
  • 31
    32
  • Implementing llvm_gc_read and llvm_gc_write
  • 33
  • Tracing the GC roots from the program stack
  • 34
  • GC implementations available
  • 35
    36
    37
    38
    41
    42
    43
    44

    Written by Chris Lattner

    45
    46
    47
    48
    49 Introduction
    50
    51
    52
    53
    54
    55

    Garbage collection is a widely used technique that frees the programmer from

    56 having to know the life-times of heap objects, making software easier to produce
    57 and maintain. Many programming languages rely on garbage collection for
    58 automatic memory management. There are two primary forms of garbage collection:
    59 conservative and accurate.

    60
    61

    Conservative garbage collection often does not require any special support

    62 from either the language or the compiler: it can handle non-type-safe
    63 programming languages (such as C/C++) and does not require any special
    64 information from the compiler. The [LINK] Boehm collector is an example of a
    65 state-of-the-art conservative collector.

    66
    67

    Accurate garbage collection requires the ability to identify all pointers in

    68 the program at run-time (which requires that the source-language be type-safe in
    69 most cases). Identifying pointers at run-time requires compiler support to
    70 locate all places that hold live pointer variables at run-time, including the
    71 processor stack and registers.

    72
    73

    74 Conservative garbage collection is attractive because it does not require any
    75 special compiler support, but it does have problems. In particular, because the
    76 conservative garbage collector cannot know that a particular word in the
    77 machine is a pointer, it cannot move live objects in the heap (preventing the
    78 use of compacting and generational GC algorithms) and it can occasionally suffer
    79 from memory leaks due to integer values that happen to point to objects in the
    80 program. In addition, some aggressive compiler transformations can break
    81 conservative garbage collectors (though these seem rare in practice).
    82

    83
    84

    85 Accurate garbage collectors do not suffer from any of these problems, but they
    86 can suffer from degraded scalar optimization of the program. In particular,
    87 because the runtime must be able to identify and update all pointers active in
    88 the program, some optimizations are less effective. In practice, however, the
    89 locality and performance benefits of using aggressive garbage allocation
    90 techniques dominates any low-level losses.
    91

    92
    93

    94 This document describes the mechanisms and interfaces provided by LLVM to
    95 support accurate garbage collection.
    96

    97
    98
    99
    100
    101
    102 GC features provided and algorithms supported
    103
    104
    105
    106
    107

    108 LLVM provides support for a broad class of garbage collection algorithms,
    109 including compacting semi-space collectors, mark-sweep collectors, generational
    110 collectors, and even reference counting implementations. It includes support
    111 for read and write barriers, and associating
    112 href="#roots">meta-data with stack objects (used for tagless garbage
    113 collection). All LLVM code generators support garbage collection, including the
    114 C backend.
    115

    116
    117

    118 We hope that the primitive support built into LLVM is sufficient to support a
    119 broad class of garbage collected languages, including Scheme, ML, scripting
    120 languages, Java, C#, etc. That said, the implemented garbage collectors may
    121 need to be extended to support language-specific features such as finalization,
    122 weak references, or other features. As these needs are identified and
    123 implemented, they should be added to this specification.
    124

    125
    126

    127 LLVM does not currently support garbage collection of multi-threaded programs or
    128 GC-safe points other than function calls, but these will be added in the future
    129 as there is interest.
    130

    131
    132
    133
    134
    135
    136 Interfaces for user programs
    137
    138
    139
    140
    141
    142

    This section describes the interfaces provided by LLVM and by the garbage

    143 collector run-time that should be used by user programs. As such, this is the
    144 interface that front-end authors should generate code for.
    145

    146
    147
    148
    149
    150
    151 Identifying GC roots on the stack: llvm.gcroot
    152
    153
    154
    155
    156
    157 void %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata)
    158
    159
    160

    161 The llvm.gcroot intrinsic is used to inform LLVM of a pointer variable
    162 on the stack. The first argument contains the address of the variable on the
    163 stack, and the second contains a pointer to metadata that should be associated
    164 with the pointer (which must be a constant or global value address). At
    165 runtime, the llvm.gcroot intrinsic stores a null pointer into the
    166 specified location to initialize the pointer.

    167
    168

    169 Consider the following fragment of Java code:
    170

    171
    172
    
                      
                    
    173 {
    174 Object X; // A null-initialized reference to an object
    175 ...
    176 }
    177
    178
    179

    180 This block (which may be located in the middle of a function or in a loop nest),
    181 could be compiled to this LLVM code:
    182

    183
    184
    
                      
                    
    185 Entry:
    186 ;; In the entry block for the function, allocate the
    187 ;; stack space for X, which is an LLVM pointer.
    188 %X = alloca %Object*
    189 ...
    190
    191 ;; "CodeBlock" is the block corresponding to the start
    192 ;; of the scope scope above.
    193 CodeBlock:
    194 ;; Initialize the object, telling LLVM that it is now live.
    195 ;; Java has type-tags on objects, so it doesn't need any
    196 ;; metadata.
    197 call void %llvm.gcroot(%Object** %X, sbyte* null)
    198 ...
    199
    200 ;; As the pointer goes out of scope, store a null value into
    201 ;; it, to indicate that the value is no longer live.
    202 store %Object* null, %Object** %X
    203 ...
    204
    205
    206
    207
    208
    209
    210 GC descriptor format for heap objects
    211
    212
    213
    214
    215

    216 Either from root meta data, or from object headers. Front-end can provide a
    217 call-back to get descriptor from object without meta-data.
    218

    219
    220
    221
    222
    223
    224 Allocating memory from the GC
    225
    226
    227
    228
    229
    230 sbyte *%llvm_gc_allocate(unsigned %Size)
    231
    232
    233

    The llvm_gc_allocate function is a global function defined by the

    234 garbage collector implementation to allocate memory. It should return a
    235 zeroed-out block of memory of the appropriate size.

    236
    237
    238
    239
    240
    241 Reading and writing references to the heap
    242
    243
    244
    245
    246
    247 sbyte *%llvm.gcread(sbyte **)
    248 void %llvm.gcwrite(sbyte*, sbyte**)
    249
    250
    251

    Several of the more interesting garbage collectors (e.g., generational

    252 collectors) need to be informed when the mutator (the program that needs garbage
    253 collection) reads or writes object references into the heap. In the case of a
    254 generational collector, it needs to keep track of which "old" generation objects
    255 have references stored into them. The amount of code that typically needs to be
    256 executed is usually quite small, so the overall performance impact of the
    257 inserted code is tolerable.

    258
    259

    To support garbage collectors that use read or write barriers, LLVM provides

    260 the llvm.gcread and llvm.gcwrite intrinsics. The first
    261 intrinsic has exactly the same semantics as a non-volatile LLVM load and the
    262 second has the same semantics as a non-volatile LLVM store. At code generation
    263 time, these intrinsics are replaced with calls into the garbage collector
    264 (llvm_gc_read and
    265 href="#llvm_gc_readwrite">llvm_gc_write respectively), which are then
    266 inlined into the code.
    267

    268
    269

    270 If you are writing a front-end for a garbage collected language, every load or
    271 store of a reference from or to the heap should use these intrinsics instead of
    272 normal LLVM loads/stores.

    273
    274
    275
    276
    277
    278 Garbage collector startup and initialization
    279
    280
    281
    282
    283
    284 void %llvm_gc_initialize()
    285
    286
    287

    288 The llvm_gc_initialize function should be called once before any other
    289 garbage collection functions are called. This gives the garbage collector the
    290 chance to initialize itself and allocate the heap spaces.
    291

    292
    293
    294
    295
    296
    297 Explicit invocation of the garbage collector
    298
    299
    300
    301
    302
    303 void %llvm_gc_collect()
    304
    305
    306

    307 The llvm_gc_collect function is exported by the garbage collector
    308 implementations to provide a full collection, even when the heap is not
    309 exhausted. This can be used by end-user code as a hint, and may be ignored by
    310 the garbage collector.
    311

    312
    313
    314
    315
    316
    317
    318 Implementing a garbage collector
    319
    320
    321
    322
    323
    324

    325 Implementing a garbage collector for LLVM is fairly straight-forward. The
    326 implementation must include the
    327 href="#allocate">llvm_gc_allocate and
    328 href="#explicit">llvm_gc_collect functions, and it must implement
    329 the read/write barrier functions as well. To
    330 do this, it will probably have to trace through the roots
    331 from the stack and understand the GC descriptors
    332 for heap objects. Luckily, there are some example
    333 implementations available.
    334

    335
    336
    337
    338
    339
    340 Implementing llvm_gc_read and llvm_gc_write
    341
    342
    343
    344
    345 void *llvm_gc_read(void **)
    346 void llvm_gc_write(void*, void**)
    347
    348
    349

    350 These functions must be implemented in every garbage collector, even if
    351 they do not need read/write barriers. In this case, just load or store the
    352 pointer, then return.
    353

    354
    355

    356 If an actual read or write barrier is needed, it should be straight-forward to
    357 implement it. Note that we may add a pointer to the start of the memory object
    358 as a parameter in the future, if needed.
    359

    360
    361
    362
    363
    364
    365 Tracing the GC roots from the program stack
    366
    367
    368
    369
    370 void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta));
    371
    372
    373

    374 The llvm_cg_walk_gcroots function is a function provided by the code
    375 generator that iterates through all of the GC roots on the stack, calling the
    376 specified function pointer with each record. For each GC root, the address of
    377 the pointer and the meta-data (from the
    378 href="#gcroot">llvm.gcroot intrinsic) are provided.
    379

    380
    381
    382
    383
    384
    385 GC implementations available
    386
    387
    388
    389
    390

    391 To make this more concrete, the currently implemented LLVM garbage collectors
    392 all live in the llvm/runtime/GC directory in the LLVM source-base.
    393

    394
    395

    396 TODO: Brief overview of each.
    397

    398
    399
    400
    401
    402
    403
    404
    405
    406
    407 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!">
    408
    409 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!">
    410
    411 Chris Lattner
    412 LLVM Compiler Infrastructure
    413 Last modified: $Date$
    414
    415
    416
    417