llvm.org GIT mirror llvm / 84e5f77
update to document new lto API git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47764 91177308-0d34-0410-b5e6-96231b3b80d8 Nick Kledzik 11 years ago
1 changed file(s) with 105 addition(s) and 142 deletion(s). Raw diff Collapse all Expand all
2020
2121
  • Phase 1 : Read LLVM Bytecode Files
  • 2222
  • Phase 2 : Symbol Resolution
  • 23
  • Phase 3 : Optimize Bytecode Files
  • 23
  • Phase 3 : Optimize Bitcode Files
  • 2424
  • Phase 4 : Symbol Resolution after optimization
  • 2525
    26
  • LLVMlto
  • 26
  • libLTO
  • 2727
    28
  • LLVMSymbol
  • 29
  • readLLVMObjectFile()
  • 30
  • optimizeModules()
  • 31
  • getTargetTriple()
  • 32
  • removeModule()
  • 33
  • getAlignment()
  • 34
    35
  • Debugging Information
  • 28
  • lto_module_t
  • 29
  • lto_code_gen_t
  • 30
    3631
    3732
    3833
    39

    Written by Devang Patel

    34

    Written by Devang Patel and Nick Kledzik

    4035
    4136
    4237
    4843
    4944

    5045 LLVM features powerful intermodular optimizations which can be used at link
    51 time. Link Time Optimization is another name for intermodular optimization
    46 time. Link Time Optimization (LTO) is another name for intermodular optimization
    5247 when performed during the link stage. This document describes the interface
    53 and design between the LLVM intermodular optimizer and the linker.

    48 and design between the LTO optimizer and the linker.

    5449
    5550
    5651
    6762 significant changes to the developer's makefiles or build system. This is
    6863 achieved through tight integration with the linker. In this model, the linker
    6964 treates LLVM bitcode files like native object files and allows mixing and
    70 matching among them. The linker uses LLVMlto, a dynamically
    71 loaded library, to handle LLVM bitcode files. This tight integration between
    65 matching among them. The linker uses libLTO, a shared
    66 object, to handle LLVM bitcode files. This tight integration between
    7267 the linker and LLVM optimizer helps to do optimizations that are not possible
    7368 in other models. The linker input allows the optimizer to avoid relying on
    7469 conservative escape analysis.
    135130 $ llvm-gcc a.o main.o -o main # <-- standard link command without any modifications
    136131
    137132

    In this example, the linker recognizes that foo2() is an

    138 externally visible symbol defined in LLVM bitcode file. This information
    139 is collected using readLLVMObjectFile().
    140 Based on this information, the linker completes its usual symbol resolution
    133 externally visible symbol defined in LLVM bitcode file. The linker completes
    134 its usual symbol resolution
    141135 pass and finds that foo2() is not used anywhere. This information
    142136 is used by the LLVM optimizer and it removes foo2(). As soon as
    143137 foo2() is removed, the optimizer recognizes that condition
    182176
    183177
    184178
    185 Multi-phase communication between LLVM and linker
    179 Multi-phase communication between libLTO and linker
    186180
    187181
    188182
    207201
    208202

    The linker first reads all object files in natural order and collects

    209203 symbol information. This includes native object files as well as LLVM bitcode
    210 files. In this phase, the linker uses
    211 readLLVMObjectFile() to collect symbol
    212 information from each LLVM bitcode files and updates its internal global
    213 symbol table accordingly. The intent of this interface is to avoid overhead
    214 in the non LLVM case, where all input object files are native object files,
    215 by putting this code in the error path of the linker. When the linker sees
    216 the first llvm .o file, it dlopen()s the dynamic library. This is
    217 to allow changes to the LLVM LTO code without relinking the linker.
    204 files. To minimize the cost to the linker in the case that all .o files
    205 are native object files, the linker only calls lto_module_create()
    206 when a supplied object file is found to not be a native object file. If
    207 lto_module_create() returns that the file is an LLVM bitcode file,
    208 the linker
    209 then iterates over the module using lto_module_get_symbol_name() and
    210 lto_module_get_symbol_attribute() to get all symbols defined and
    211 referenced.
    212 This information is added to the linker's global symbol table.
    213

    214

    The lto* functions are all implemented in a shared object libLTO. This

    215 allows the LLVM LTO code to be updated independently of the linker tool.
    216 On platforms that support it, the shared object is lazily loaded.
    218217

    219218
    220219
    224223
    225224
    226225
    227

    In this stage, the linker resolves symbols using global symbol table

    228 information to report undefined symbol errors, read archive members, resolve
    229 weak symbols, etc. The linker is able to do this seamlessly even though it
    230 does not know the exact content of input LLVM bitcode files because it uses
    231 symbol information provided by
    232 <a href="#readllvmobjectfile">readLLVMObjectFile(). If dead code
    226 <p>In this stage, the linker resolves symbols using global symbol table.
    227 It may report undefined symbol errors, read archive members, replace
    228 weak symbols, etc. The linker is able to do this seamlessly even though it
    229 does not know the exact content of input LLVM bitcode files. If dead code
    233230 stripping is enabled then the linker collects the list of live symbols.
    234231

    235232
    239236 Phase 3 : Optimize Bitcode Files
    240237
    241238
    242

    After symbol resolution, the linker updates symbol information supplied

    243 by LLVM bitcode files appropriately. For example, whether certain LLVM
    244 bitcode supplied symbols are used or not. In the example above, the linker
    245 reports that foo2() is not used anywhere in the program, including
    246 native .o files. This information is used by the LLVM interprocedural
    247 optimizer. The linker uses optimizeModules()
    248 and requests an optimized native object file of the LLVM portion of the
    249 program.
    239

    After symbol resolution, the linker tells the LTO shared object which

    240 symbols are needed by native object files. In the example above, the linker
    241 reports that only foo1() is used by native object files using
    242 lto_codegen_add_must_preserve_symbol(). Next the linker invokes
    243 the LLVM optimizer and code generators using lto_codegen_compile()
    244 which returns a native object file creating by merging the LLVM bitcode files
    245 and applying various optimization passes.
    250246

    251247
    252248
    269265
    270266
    271267
    272 LLVMlto
    273
    274
    275
    276

    LLVMlto is a dynamic library that is part of the LLVM tools, and

    277 is intended for use by a linker. LLVMlto provides an abstract C++
    268 libLTO
    269
    270
    271
    272

    libLTO is a shared object that is part of the LLVM tools, and

    273 is intended for use by a linker. libLTO provides an abstract C
    278274 interface to use the LLVM interprocedural optimizer without exposing details
    279275 of LLVM's internals. The intention is to keep the interface as stable as
    280 possible even when the LLVM optimizer continues to evolve.

    281
    282
    283
    284
    285 LLVMSymbol
    286
    287
    288
    289

    The LLVMSymbol class is used to describe the externally visible

    290 functions and global variables, defined in LLVM bitcode files, to the linker.
    291 This includes symbol visibility information. This information is used by
    292 the linker to do symbol resolution. For example: function foo2() is
    293 defined inside an LLVM bitcode module and it is an externally visible symbol.
    294 This helps the linker connect the use of foo2() in native object
    295 files with a future definition of the symbol foo2(). The linker
    296 will see the actual definition of foo2() when it receives the
    297 optimized native object file in
    298 Symbol Resolution after optimization phase. If the
    299 linker does not find any uses of foo2(), it updates LLVMSymbol
    300 visibility information to notify LLVM intermodular optimizer that it is dead.
    301 The LLVM intermodular optimizer takes advantage of such information to
    302 generate better code.

    303
    304
    305
    306
    307 readLLVMObjectFile()
    308
    309
    310
    311

    The readLLVMObjectFile() function is used by the linker to read

    312 LLVM bitcode files and collect LLVMSymbol information. This routine also
    313 supplies a list of externally defined symbols that are used by LLVM bitcode
    314 files. The linker uses this symbol information to do symbol resolution.
    315 Internally, LLVMlto maintains LLVM bitcode modules in
    316 memory. This function also provides a list of external references used by
    317 bitcode files.

    318
    319
    320
    321
    322 optimizeModules()
    323
    324
    325
    326

    The linker invokes optimizeModules to optimize already read

    327 LLVM bitcode files by applying LLVM intermodular optimization techniques.
    328 This function runs the LLVM intermodular optimizer and generates native
    329 object code as .o files at the name and location provided by the
    330 linker.

    331
    332
    333
    334
    335 getTargetTriple()
    336
    337
    338
    339

    The linker may use getTargetTriple() to query target architecture

    340 while validating LLVM bitcode file.

    341
    342
    343
    344
    345 removeModule()
    346
    347
    348
    349

    Internally, LLVMlto maintains LLVM bitcode modules in

    350 memory. The linker may use removeModule() method to remove desired
    351 modules from memory.

    352
    353
    354
    355
    356 getAlignment()
    357
    358
    359
    360

    The linker may use LLVMSymbol method

    361 getAlignment() to query symbol alignment information.

    362
    363
    364
    365
    366 Debugging Information
    367
    368
    369
    370
    371
    372

    ... To be completed ...

    373
    276 possible even when the LLVM optimizer continues to evolve. It should even
    277 be possible for a completely different compilation technology to provide
    278 a different libLTO that works with their object files and the standard
    279 linker tool.

    280
    281
    282
    283
    284 lto_module_t
    285
    286
    287
    288

    A non-native object file is handled via an lto_module_t.

    289 The following functions allow the linker to check if a file (on disk
    290 or in a memory buffer) is a file which libLTO can process:
    
                      
                    
    291 lto_module_is_object_file(const char*)
    292 lto_module_is_object_file_for_target(const char*, const char*)
    293 lto_module_is_object_file_in_memory(const void*, size_t)
    294 lto_module_is_object_file_in_memory_for_target(const void*, size_t, const char*)
    295 If the object file can be processed by libLTO, the linker creates a
    296 lto_module_t by using one of
    
                      
                    
    297 lto_module_create(const char*)
    298 lto_module_create_from_memory(const void*, size_t)
    299 and when done, the handle is released via
    
                      
                    
    300 lto_module_dispose(lto_module_t)
    301 The linker can introspect the non-native object file by getting the number
    302 of symbols and getting the name and attributes of each symbol via:
    
                      
                    
    303 lto_module_get_num_symbols(lto_module_t)
    304 lto_module_get_symbol_name(lto_module_t, unsigned int)
    305 lto_module_get_symbol_attribute(lto_module_t, unsigned int)
    306 The attributes of a symbol include the alignment, visibility, and kind.
    307

    308
    309
    310
    311
    312 lto_code_gen_t
    313
    314
    315
    316

    Once the linker has loaded each non-native object files into an

    317 lto_module_t, it can request libLTO to process them all and
    318 generate a native object file. This is done in a couple of steps.
    319 First a code generator is created with:
    
                      
                    
    320 lto_codegen_create()
    321 then each non-native object file is added to the code generator with:
    
                      
                    
    322 lto_codegen_add_module(lto_code_gen_t, lto_module_t)
    323 The linker then has the option of setting some codegen options. Whether
    324 or not to generate DWARF debug info is set with:
    
                      
                    
    325 lto_codegen_set_debug_model(lto_code_gen_t)
    326 Which kind of position independence is set with:
    
                      
                    
    327 lto_codegen_set_pic_model(lto_code_gen_t)
    328 And each symbol that is referenced by a native object file or otherwise
    329 must not be optimized away is set with:
    
                      
                    
    330 lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)
    331 After all these settings are done, the linker requests that a native
    332 object file be created from the modules with the settings using:
    333 lto_codegen_compile(lto_code_gen_t, size*)
    334 which returns a pointer to a buffer containing the generated native
    335 object file. The linker then parses that and links it with the rest
    336 of the native object files.
    374337
    375338
    376339
    382345
    383346 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!">
    384347
    385 Devang Patel
    348 Devang Patel and Nick Kledzik
    386349 LLVM Compiler Infrastructure
    387350 Last modified: $Date$
    388351