llvm.org GIT mirror llvm / fa775d2
Merge in doc changes for release. git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_23@52125 91177308-0d34-0410-b5e6-96231b3b80d8 Tanya Lattner 11 years ago
3 changed file(s) with 465 addition(s) and 821 deletion(s). Raw diff Collapse all Expand all
8383
8484
  • Target-specific Implementation Notes
  • 8585
    86
  • Tail call optimization
  • 8687
  • The X86 backend
  • 8788
  • The PowerPC backend
  • 8889
    16191620
    16201621
    16211622
    1622
    1623
    1624
    1625 Tail call optimization
    1626
    1627
    1628
    1629

    Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:

    1630
    1631
  • Caller and callee have the calling convention fastcc.
  • 1632
  • The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).
  • 1633
  • Option -tailcallopt is enabled.
  • 1634
  • Platform specific constraints are met.
  • 1635
    1636

    1637
    1638

    x86/x86-64 constraints:

    1639
    1640
  • No variable argument lists are used.
  • 1641
  • On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.
  • 1642
    1643

    1644

    PowerPC constraints:

    1645
    1646
  • No variable argument lists are used.
  • 1647
  • No byval parameters are used.
  • 1648
  • On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.
  • 1649
    1650

    1651

    Example:

    1652

    Call as llc -tailcallopt test.ll.

    1653
    1654
    
                      
                    
    1655 declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
    1656
    1657 define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
    1658 %l1 = add i32 %in1, %in2
    1659 %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
    1660 ret i32 %tmp
    1661 }
    1662
    1663

    1664

    Implications of -tailcallopt:

    1665

    To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each fastcc call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.

    1666

    On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a function pointer). So there is one less register for integer argument passing. For x86 this means 2 registers (if inreg parameter attribute is used) and for x86-64 this means 5 register are used.

    1667
    16231668
    16241669
    16251670 The X86 backend
    None
    1 <html>
    0 <?xml version="1.0" encoding="utf-8" ?>
    1
    2
    23
    3
    4 The LLVM Compiler Driver (llvmc)
    5
    6
    7
    8 content="A description of the use and design of the LLVM Compiler Driver.">
    4 >
    5
    6 Customizing LLVMC: Reference Manual
    7
    98
    109
    11
    The LLVM Compiler Driver (llvmc)
    12

    NOTE: This document is a work in progress!

    13
    14
  • Abstract
  • 15
  • Introduction
  • 16
    17
  • Purpose
  • 18
  • Operation
  • 19
  • Phases
  • 20
  • Actions
  • 21
    22
    23
  • Configuration
  • 24
    25
  • Overview
  • 26
  • Configuration Files
  • 27
  • Syntax
  • 28
  • Substitutions
  • 29
  • Sample Config File
  • 30
    31
  • Glossary
  • 32
    33
    34

    Written by Reid Spencer

    35

    36
    37
    38
    39
    40
    41
    42

    This document describes the requirements, design, and configuration of the

    43 LLVM compiler driver, llvmc. The compiler driver knows about LLVM's
    44 tool set and can be configured to know about a variety of compilers for
    45 source languages. It uses this knowledge to execute the tools necessary
    46 to accomplish general compilation, optimization, and linking tasks. The main
    47 purpose of llvmc is to provide a simple and consistent interface to
    48 all compilation tasks. This reduces the burden on the end user who can just
    49 learn to use llvmc instead of the entire LLVM tool set and all the
    50 source language compilers compatible with LLVM.

    51
    52
    53
    54
    55
    56

    The llvmc tool is a configurable compiler

    57 driver. As such, it isn't a compiler, optimizer,
    58 or a linker itself but it drives (invokes) other software that perform those
    59 tasks. If you are familiar with the GNU Compiler Collection's gcc
    60 tool, llvmc is very similar.

    61

    The following introductory sections will help you understand why this tool

    62 is necessary and what it does.

    63
    64
    65
    66
    67
    68

    llvmc was invented to make compilation of user programs with

    69 LLVM-based tools easier. To accomplish this, llvmc strives to:

    70
    71
  • Be the single point of access to most of the LLVM tool set.
  • 72
  • Hide the complexities of the LLVM tools through a single interface.
  • 73
  • Provide a consistent interface for compiling all languages.
  • 74
    75

    Additionally, llvmc makes it easier to write a compiler for use

    76 with LLVM, because it:

    77
    78
  • Makes integration of existing non-LLVM tools simple.
  • 79
  • Extends the capabilities of minimal compiler tools by optimizing their
  • 80 output.
    81
  • Reduces the number of interfaces a compiler writer must know about
  • 82 before a working compiler can be completed (essentially only the VMCore
    83 interfaces need to be understood).
    84
  • Supports source language translator invocation via both dynamically
  • 85 loadable shared objects and invocation of an executable.
    86
    87
    88
    89
    90
    91
    92

    At a high level, llvmc operation is very simple. The basic action

    93 taken by llvmc is to simply invoke some tool or set of tools to fill
    94 the user's request for compilation. Every execution of llvmctakes the
    95 following sequence of steps:

    96
    97
    Collect Command Line Options
    98
    The command line options provide the marching orders to llvmc
    99 on what actions it should perform. This is the request the user is making
    100 of llvmc and it is interpreted first. See the llvmc
    101 manual page for details on the
    102 options.
    103
    Read Configuration Files
    104
    Based on the options and the suffixes of the filenames presented, a set
    105 of configuration files are read to configure the actions llvmc will
    106 take. Configuration files are provided by either LLVM or the
    107 compiler tools that llvmc invokes. These files determine what
    108 actions llvmc will take in response to the user's request. See
    109 the section on configuration for more details.
    110
    111
    Determine Phases To Execute
    112
    Based on the command line options and configuration files,
    113 llvmc determines the compilation phases that
    114 must be executed by the user's request. This is the primary work of
    115 llvmc.
    116
    Determine Actions To Execute
    117
    Each phase to be executed can result in the
    118 invocation of one or more actions. An action is
    119 either a whole program or a function in a dynamically linked shared library.
    120 In this step, llvmc determines the sequence of actions that must be
    121 executed. Actions will always be executed in a deterministic order.
    122
    Execute Actions
    123
    The actions necessary to support the user's
    124 original request are executed sequentially and deterministically. All
    125 actions result in either the invocation of a whole program to perform the
    126 action or the loading of a dynamically linkable shared library and invocation
    127 of a standard interface function within that library.
    128
    Termination
    129
    If any action fails (returns a non-zero result code), llvmc
    130 also fails and returns the result code from the failing action. If
    131 everything succeeds, llvmc will return a zero result code.
    132
    133

    llvmc's operation must be simple, regular and predictable.

    134 Developers need to be able to rely on it to take a consistent approach to
    135 compilation. For example, the invocation:

    136
    137 llvmc -O2 x.c y.c z.c -o xyz
    138

    must produce exactly the same results as:

    139
    
    
                      
                    
    140 llvmc -O2 x.c -o x.o
    141 llvmc -O2 y.c -o y.o
    142 llvmc -O2 z.c -o z.o
    143 llvmc -O2 x.o y.o z.o -o xyz
    144

    To accomplish this, llvmc uses a very simple goal oriented

    145 procedure to do its work. The overall goal is to produce a functioning
    146 executable. To accomplish this, llvmc always attempts to execute a
    147 series of compilation phases in the same sequence.
    148 However, the user's options to llvmc can cause the sequence of phases
    149 to start in the middle or finish early.

    150
    151
    152
    153
    Phases
    154
    155

    llvmc breaks every compilation task into the following five

    156 distinct phases:

    157
    Preprocessing
    Not all languages support preprocessing;
    158 but for those that do, this phase can be invoked. This phase is for
    159 languages that provide combining, filtering, or otherwise altering with the
    160 source language input before the translator parses it. Although C and C++
    161 are the most common users of this phase, other languages may provide their
    162 own preprocessor (whether its the C pre-processor or not).
    163
    164
    Translation
    The translation phase converts the source
    165 language input into something that LLVM can interpret and use for
    166 downstream phases. The translation is essentially from "non-LLVM form" to
    167 "LLVM form".
    168
    169
    Optimization
    Once an LLVM Module has been obtained from
    170 the translation phase, the program enters the optimization phase. This phase
    171 attempts to optimize all of the input provided on the command line according
    172 to the options provided.
    173
    174
    Linking
    The inputs are combined to form a complete
    175 program.
    176
    177

    The following table shows the inputs, outputs, and command line options

    178 applicable to each phase.

    179
    180
    181 Phase
    182 Inputs
    183 Outputs
    184 Options
    185
    186
    Preprocessing
    187
    • Source Language File
    188
    • Source Language File
    189
    190
    -E
    191
    Stops the compilation after preprocessing
    192
    193
    194
    195 Translation
    196
    197
  • Source Language File
  • 198
    199
    200
  • LLVM Assembly
  • 201
  • LLVM Bitcode
  • 202
  • LLVM C++ IR
  • 203
    204
    205
    -c
    206
    Stops the compilation after translation so that optimization and
    207 linking are not done.
    208
    -S
    209
    Stops the compilation before object code is written so that only
    210 assembly code remains.
    211
    212
    213
    214 Optimization
    215
    216
  • LLVM Assembly
  • 217
  • LLVM Bitcode
  • 218
    219
    220
  • LLVM Bitcode
  • 221
    222
    223
    -Ox
    224
    This group of options controls the amount of optimization
    225 performed.
    226
    227
    228
    229 Linking
    230
    231
  • LLVM Bitcode
  • 232
  • Native Object Code
  • 233
  • LLVM Library
  • 234
  • Native Library
  • 235
    236
    237
  • LLVM Bitcode Executable
  • 238
  • Native Executable
  • 239
    240
    241
    -L
    Specifies a path for library search.
    242
    -l
    Specifies a library to link in.
    243
    244
    245
    246
    247
    248
    249
    Actions
    250
    251

    An action, with regard to llvmc is a basic operation that it takes

    252 in order to fulfill the user's request. Each phase of compilation will invoke
    253 zero or more actions in order to accomplish that phase.

    254

    Actions come in two forms:

    255
    256
  • Invokable Executables
  • 257
  • Functions in a shared library
  • 258
    259
    260
    261
    262
    263
    264
    265

    This section of the document describes the configuration files used by

    266 llvmc. Configuration information is relatively static for a
    267 given release of LLVM and a compiler tool. However, the details may
    268 change from release to release of either. Users are encouraged to simply use
    269 the various options of the llvmc command and ignore the configuration
    270 of the tool. These configuration files are for compiler writers and LLVM
    271 developers. Those wishing to simply use llvmc don't need to understand
    272 this section but it may be instructive on how the tool works.

    273
    274
    275
    276
    Overview
    277
    278

    llvmc is highly configurable both on the command line and in

    279 configuration files. The options it understands are generic, consistent and
    280 simple by design. Furthermore, the llvmc options apply to the
    281 compilation of any LLVM enabled programming language. To be enabled as a
    282 supported source language compiler, a compiler writer must provide a
    283 configuration file that tells llvmc how to invoke the compiler
    284 and what its capabilities are. The purpose of the configuration files then
    285 is to allow compiler writers to specify to llvmc how the compiler
    286 should be invoked. Users may but are not advised to alter the compiler's
    287 llvmc configuration.

    288
    289

    Because llvmc just invokes other programs, it must deal with the

    290 available command line options for those programs regardless of whether they
    291 were written for LLVM or not. Furthermore, not all compiler tools will
    292 have the same capabilities. Some compiler tools will simply generate LLVM assembly
    293 code, others will be able to generate fully optimized bitcode. In general,
    294 llvmc doesn't make any assumptions about the capabilities or command
    295 line options of a sub-tool. It simply uses the details found in the
    296 configuration files and leaves it to the compiler writer to specify the
    297 configuration correctly.

    298
    299

    This approach means that new compiler tools can be up and working very

    300 quickly. As a first cut, a tool can simply compile its source to raw
    301 (unoptimized) bitcode or LLVM assembly and llvmc can be configured
    302 to pick up the slack (translate LLVM assembly to bitcode, optimize the
    303 bitcode, generate native assembly, link, etc.). In fact, the compiler tools
    304 need not use any LLVM libraries, and it could be written in any language
    305 (instead of C++). The configuration data will allow the full range of
    306 optimization, assembly, and linking capabilities that LLVM provides to be added
    307 to these kinds of tools. Enabling the rapid development of front-ends is one
    308 of the primary goals of llvmc.

    309
    310

    As a compiler tool matures, it may utilize the LLVM libraries and tools

    311 to more efficiently produce optimized bitcode directly in a single compilation
    312 and optimization program. In these cases, multiple tools would not be needed
    313 and the configuration data for the compiler would change.

    314
    315

    Configuring llvmc to the needs and capabilities of a source language

    316 compiler is relatively straight-forward. A compiler writer must provide a
    317 definition of what to do for each of the five compilation phases for each of
    318 the optimization levels. The specification consists simply of prototypical
    319 command lines into which llvmc can substitute command line
    320 arguments and file names. Note that any given phase can be completely blank if
    321 the source language's compiler combines multiple phases into a single program.
    322 For example, quite often pre-processing, translation, and optimization are
    323 combined into a single program. The specification for such a compiler would have
    324 blank entries for pre-processing and translation but a full command line for
    325 optimization.

    326
    327
    328
    329
    330
    331
    332

    Each configuration file provides the details for a single source language

    333 that is to be compiled. This configuration information tells llvmc
    334 how to invoke the language's pre-processor, translator, optimizer, assembler
    335 and linker. Note that a given source language needn't provide all these tools
    336 as many of them exist in llvm currently.

    337
    338
    339
    340
    341
    342

    llvmc always looks for files of a specific name. It uses the

    343 first file with the name its looking for by searching directories in the
    344 following order:
    345
    346
  • Any directory specified by the -config-dir option will be
  • 347 checked first.
    348
  • If the environment variable LLVM_CONFIG_DIR is set, and it contains
  • 349 the name of a valid directory, that directory will be searched next.
    350
  • If the user's home directory (typically /home/user contains
  • 351 a sub-directory named .llvm and that directory contains a
    352 sub-directory named etc then that directory will be tried
    353 next.
    354
  • If the LLVM installation directory (typically /usr/local/llvm
  • 355 contains a sub-directory named etc then that directory will be
    356 tried last.
    357
  • A standard "system" directory will be searched next. This is typically
  • 358 /etc/llvm on UNIX™ and C:\WINNT on Microsoft
    359 Windows™.
    360
  • If the configuration file sought still can't be found, llvmc
  • 361 will print an error message and exit.
    362
    363

    The first file found in this search will be used. Other files with the

    364 same name will be ignored even if they exist in one of the subsequent search
    365 locations.

    366
    367
    368
    369
    370

    In the directories searched, each configuration file is given a specific

    371 name to foster faster lookup (so llvmc doesn't have to do directory searches).
    372 The name of a given language specific configuration file is simply the same
    373 as the suffix used to identify files containing source in that language.
    374 For example, a configuration file for C++ source might be named
    375 cpp, C, or cxx. For languages that support multiple
    376 file suffixes, multiple (probably identical) files (or symbolic links) will
    377 need to be provided.

    378
    379
    380
    381
    382

    Which configuration files are read depends on the command line options and

    383 the suffixes of the file names provided on llvmc's command line. Note
    384 that the -x LANGUAGE option alters the language that llvmc
    385 uses for the subsequent files on the command line. Only the configuration
    386 files actually needed to complete llvmc's task are read. Other
    387 language specific files will be ignored.

    388
    389
    390
    391
    Syntax
    392
    393

    The syntax of the configuration files is very simple and somewhat

    394 compatible with Java's property files. Here are the syntax rules:

    395
    396
  • The file encoding is ASCII.
  • 397
  • The file is line oriented. There should be one configuration definition
  • 398 per line. Lines are terminated by the newline (0x0A) and/or carriage return
    399 characters (0x0D)
    400
  • A backslash (\) before a newline causes the newline to be
  • 401 ignored. This is useful for line continuation of long definitions. A
    402 backslash anywhere else is recognized as a backslash.
    403
  • A configuration item consists of a name, an = and a value.
  • 404
  • A name consists of a sequence of identifiers separated by period.
  • 405
  • An identifier consists of specific keywords made up of only lower case
  • 406 and upper case letters (e.g. lang.name).
    407
  • Values come in four flavors: booleans, integers, commands and
  • 408 strings.
    409
  • Valid "false" boolean values are false False FALSE no No NO
  • 410 off Off and OFF.
    411
  • Valid "true" boolean values are true True TRUE yes Yes YES
  • 412 on On and ON.
    413
  • Integers are simply sequences of digits.
  • 414
  • Commands start with a program name and are followed by a sequence of
  • 415 words that are passed to that program as command line arguments. Program
    416 arguments that begin and end with the % sign will have their value
    417 substituted. Program names beginning with / are considered to be
    418 absolute. Otherwise the PATH will be applied to find the program to
    419 execute.
    420
  • Strings are composed of multiple sequences of characters from the
  • 421 character class [-A-Za-z0-9_:%+/\\|,] separated by white
    422 space.
    423
  • White space on a line is folded. Multiple blanks or tabs will be
  • 424 reduced to a single blank.
    425
  • White space before the configuration item's name is ignored.
  • 426
  • White space on either side of the = is ignored.
  • 427
  • White space in a string value is used to separate the individual
  • 428 components of the string value but otherwise ignored.
    429
  • Comments are introduced by the # character. Everything after a
  • 430 # and before the end of line is ignored.
    431
    432
    433
    434
    435
    436
    437

    The table below provides definitions of the allowed configuration items

    438 that may appear in a configuration file. Every item has a default value and
    439 does not need to appear in the configuration file. Missing items will have the
    440 default value. Each identifier may appear as all lower case, first letter
    441 capitalized or all upper case.

    442
    443
    444
    445 Name
    446 Value Type
    447 Description
    448 Default
    449
    450

    LLVMC ITEMS

    451
    452 version
    453 string
    454 Provides the version string for the contents of this
    455 configuration file. What is accepted as a legal configuration file
    456 will change over time and this item tells llvmc which version
    457 should be expected.
    458 b
    459
    460

    LANG ITEMS

    461
    462 lang.name
    463 string
    464 Provides the common name for a language definition.
    465 For example "C++", "Pascal", "FORTRAN", etc.
    466 blank
    467
    468
    469 lang.opt1
    470 string
    471 Specifies the parameters to give the optimizer when
    472 -O1 is specified on the llvmc command line.
    473 -simplifycfg -instcombine -mem2reg
    474
    475
    476 lang.opt2
    477 string
    478 Specifies the parameters to give the optimizer when
    479 -O2 is specified on the llvmc command line.
    480 TBD
    481
    482
    483 lang.opt3
    484 string
    485 Specifies the parameters to give the optimizer when
    486 -O3 is specified on the llvmc command line.
    487 TBD
    488
    489
    490 lang.opt4
    491 string
    492 Specifies the parameters to give the optimizer when
    493 -O4 is specified on the llvmc command line.
    494 TBD
    495
    496
    497 lang.opt5
    498 string
    499 Specifies the parameters to give the optimizer when
    500 -O5 is specified on the llvmc command line.
    501 TBD
    502
    503

    PREPROCESSOR ITEMS

    504
    505 preprocessor.command
    506 command
    507 This provides the command prototype that will be used
    508 to run the preprocessor. This is generally only used with the
    509 -E option.
    510 <blank>
    511
    512
    513 preprocessor.required
    514 boolean
    515 This item specifies whether the pre-processing phase
    516 is required by the language. If the value is true, then the
    517 preprocessor.command value must not be blank. With this option,
    518 llvmc will always run the preprocessor as it assumes that the
    519 translation and optimization phases don't know how to pre-process their
    520 input.
    521 false
    522
    523

    TRANSLATOR ITEMS

    524
    525 translator.command
    526 command
    527 This provides the command prototype that will be used
    528 to run the translator. Valid substitutions are %in% for the
    529 input file and %out% for the output file.
    530 <blank>
    531
    532
    533 translator.output
    534 bitcode or assembly
    535 This item specifies the kind of output the language's
    536 translator generates.
    537 bitcode
    538
    539
    540 translator.preprocesses
    541 boolean
    542 Indicates that the translator also preprocesses. If
    543 this is true, then llvmc will skip the pre-processing phase
    544 whenever the final phase is not pre-processing.
    545 false
    546
    547

    OPTIMIZER ITEMS

    548
    549 optimizer.command
    550 command
    551 This provides the command prototype that will be used
    552 to run the optimizer. Valid substitutions are %in% for the
    553 input file and %out% for the output file.
    554 <blank>
    555
    556
    557 optimizer.output
    558 bitcode or assembly
    559 This item specifies the kind of output the language's
    560 optimizer generates. Valid values are "assembly" and "bitcode"
    561 bitcode
    562
    563
    564 optimizer.preprocesses
    565 boolean
    566 Indicates that the optimizer also preprocesses. If
    567 this is true, then llvmc will skip the pre-processing phase
    568 whenever the final phase is optimization or later.
    569 false
    570
    571
    572 optimizer.translates
    573 boolean
    574 Indicates that the optimizer also translates. If
    575 this is true, then llvmc will skip the translation phase
    576 whenever the final phase is optimization or later.
    577 false
    578
    579

    ASSEMBLER ITEMS

    580
    581 assembler.command
    582 command
    583 This provides the command prototype that will be used
    584 to run the assembler. Valid substitutions are %in% for the
    585 input file and %out% for the output file.
    586 <blank>
    587
    588
    589
    590
    591
    592
    593
    594
    595

    On any configuration item that ends in command, you must

    596 specify substitution tokens. Substitution tokens begin and end with a percent
    597 sign (%) and are replaced by the corresponding text. Any substitution
    598 token may be given on any command line but some are more useful than
    599 others. In particular each command should have both an %in%
    600 and an %out% substitution. The table below provides definitions of
    601 each of the allowed substitution tokens.

    602
    603
    604
    605 Substitution Token
    606 Replacement Description
    607
    608
    609 %args%
    610 Replaced with all the tool-specific arguments given
    611 to llvmc via the -T set of options. This just allows
    612 you to place these arguments in the correct place on the command line.
    613 If the %args% option does not appear on your command line,
    614 then you are explicitly disallowing the -T option for your
    615 tool.
    616
    617
    618 %force%
    619 Replaced with the -f option if it was
    620 specified on the llvmc command line. This is intended to tell
    621 the compiler tool to force the overwrite of output files.
    622
    623
    624
    625 %in%
    626 Replaced with the full path of the input file. You
    627 needn't worry about the cascading of file names. llvmc will
    628 create temporary files and ensure that the output of one phase is the
    629 input to the next phase.
    630
    631
    632 %opt%
    633 Replaced with the optimization options for the
    634 tool. If the tool understands the -O options then that will
    635 be passed. Otherwise, the lang.optN series of configuration
    636 items will specify which arguments are to be given.
    637
    638
    639 %out%
    640 Replaced with the full path of the output file.
    641 Note that this is not necessarily the output file specified with the
    642 -o option on llvmc's command line. It might be a
    643 temporary file that will be passed to a subsequent phase's input.
    644
    645
    646
    647 %stats%
    648 If your command accepts the -stats option,
    649 use this substitution token. If the user requested -stats
    650 from the llvmc command line then this token will be replaced
    651 with -stats, otherwise it will be ignored.
    652
    653
    654
    655 %target%
    656 Replaced with the name of the target "machine" for
    657 which code should be generated. The value used here is taken from the
    658 llvmc option -march.
    659
    660
    661
    662 %time%
    663 If your command accepts the -time-passes
    664 option, use this substitution token. If the user requested
    665 -time-passes from the llvmc command line then this
    666 token will be replaced with -time-passes, otherwise it will
    667 be ignored.
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677

    Since an example is always instructive, here's how the Stacker language

    678 configuration file looks.

    679
    
    
                      
                    
    680 # Stacker Configuration File For llvmc
    681
    682 ##########################################################
    683 # Language definitions
    684 ##########################################################
    685 lang.name=Stacker
    686 lang.opt1=-simplifycfg -instcombine -mem2reg
    687 lang.opt2=-simplifycfg -instcombine -mem2reg -load-vn \
    688 -gcse -dse -scalarrepl -sccp
    689 lang.opt3=-simplifycfg -instcombine -mem2reg -load-vn \
    690 -gcse -dse -scalarrepl -sccp -branch-combine -adce \
    691 -globaldce -inline -licm
    692 lang.opt4=-simplifycfg -instcombine -mem2reg -load-vn \
    693 -gcse -dse -scalarrepl -sccp -ipconstprop \
    694 -branch-combine -adce -globaldce -inline -licm
    695 lang.opt5=-simplifycfg -instcombine -mem2reg --load-vn \
    696 -gcse -dse scalarrepl -sccp -ipconstprop \
    697 -branch-combine -adce -globaldce -inline -licm \
    698 -block-placement
    699
    700 ##########################################################
    701 # Pre-processor definitions
    702 ##########################################################
    703
    704 # Stacker doesn't have a preprocessor but the following
    705 # allows the -E option to be supported
    706 preprocessor.command=cp %in% %out%
    707 preprocessor.required=false
    708
    709 ##########################################################
    710 # Translator definitions
    711 ##########################################################
    712
    713 # To compile stacker source, we just run the stacker
    714 # compiler with a default stack size of 2048 entries.
    715 translator.command=stkrc -s 2048 %in% -o %out% %time% \
    716 %stats% %force% %args%
    717
    718 # stkrc doesn't preprocess but we set this to true so
    719 # that we don't run the cp command by default.
    720 translator.preprocesses=true
    721
    722 # The translator is required to run.
    723 translator.required=true
    724
    725 # stkrc doesn't handle the -On options
    726 translator.output=bitcode
    727
    728 ##########################################################
    729 # Optimizer definitions
    730 ##########################################################
    731
    732 # For optimization, we use the LLVM "opt" program
    733 optimizer.command=opt %in% -o %out% %opt% %time% %stats% \
    734 %force% %args%
    735
    736 optimizer.required = true
    737
    738 # opt doesn't translate
    739 optimizer.translates = no
    740
    741 # opt doesn't preprocess
    742 optimizer.preprocesses=no
    743
    744 # opt produces bitcode
    745 optimizer.output = bc
    746
    747 ##########################################################
    748 # Assembler definitions
    749 ##########################################################
    750 assembler.command=llc %in% -o %out% %target% %time% %stats%
    751
    752
    753
    754
    755
    756
    757
    758

    This document uses precise terms in reference to the various artifacts and

    759 concepts related to compilation. The terms used throughout this document are
    760 defined below.

    761
    762
    assembly
    763
    A compilation phase in which LLVM bitcode or
    764 LLVM assembly code is assembled to a native code format (either target
    765 specific aseembly language or the platform's native object file format).
    766
    767
    768
    compiler
    769
    Refers to any program that can be invoked by llvmc to accomplish
    770 the work of one or more compilation phases.
    771
    772
    driver
    773
    Refers to llvmc itself.
    774
    775
    linking
    776
    A compilation phase in which LLVM bitcode files
    777 and (optionally) native system libraries are combined to form a complete
    778 executable program.
    779
    780
    optimization
    781
    A compilation phase in which LLVM bitcode is
    782 optimized.
    783
    784
    phase
    785
    Refers to any one of the five compilation phases that that
    786 llvmc supports. The five phases are:
    787 preprocessing,
    788 translation,
    789 optimization,
    790 assembly,
    791 linking.
    792
    793
    source language
    794
    Any common programming language (e.g. C, C++, Java, Stacker, ML,
    795 FORTRAN). These languages are distinguished from any of the lower level
    796 languages (such as LLVM or native assembly), by the fact that a
    797 translation phase
    798 is required before LLVM can be applied.
    799
    800
    tool
    801
    Refers to any program in the LLVM tool set.
    802
    803
    translation
    804
    A compilation phase in which
    805 source language code is translated into
    806 either LLVM assembly language or LLVM bitcode.
    807
    808
    809
    810
    811
    812 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!">
    813 href="http://validator.w3.org/check/referer">
    814 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!">
    815 href="mailto:rspencer@x10sys.com">Reid Spencer
    816 The LLVM Compiler Infrastructure
    817 Last modified: $Date$
    10
    11
    12
    Customizing LLVMC: Reference Manual
    13
    14
    15

    Note: This document is a work-in-progress. Additions and clarifications

    16 are welcome.

    17
    18
    19

    LLVMC is a generic compiler driver, designed to be customizable and

    20 extensible. It plays the same role for LLVM as the gcc program
    21 does for GCC - LLVMC's job is essentially to transform a set of input
    22 files into a set of targets depending on configuration rules and user
    23 options. What makes LLVMC different is that these transformation rules
    24 are completely customizable - in fact, LLVMC knows nothing about the
    25 specifics of transformation (even the command-line options are mostly
    26 not hard-coded) and regards the transformation structure as an
    27 abstract graph. This makes it possible to adapt LLVMC for other
    28 purposes - for example, as a build tool for game resources.

    29

    Because LLVMC employs TableGen [1] as its configuration language, you

    30 need to be familiar with it to customize LLVMC.

    31
    32
    33
  • Compiling with LLVMC
  • 34
  • Predefined options
  • 35
  • Customizing LLVMC: the compilation graph
  • 36
  • Writing a tool description
  • 37
  • Option list - specifying all options in a single place
  • 38
  • Using hooks and environment variables in the cmd_line property
  • 39
  • Conditional evaluation: the case expression
  • 40
  • Language map
  • 41
  • References
  • 42
    43
    44
    45
    Written by Mikhail Glushenkov
    46
    47
    48
    49

    LLVMC tries hard to be as compatible with gcc as possible,

    50 although there are some small differences. Most of the time, however,
    51 you shouldn't be able to notice them:

    52
    
                      
                    
    53 $ # This works as expected:
    54 $ llvmc2 -O3 -Wall hello.cpp
    55 $ ./a.out
    56 hello
    57
    58

    One nice feature of LLVMC is that one doesn't have to distinguish

    59 between different compilers for different languages (think g++ and
    60 gcc) - the right toolchain is chosen automatically based on input
    61 language names (which are, in turn, determined from file
    62 extensions). If you want to force files ending with ".c" to compile as
    63 C++, use the -x option, just like you would do it with gcc:

    64
    
                      
                    
    65 $ llvmc2 -x c hello.cpp
    66 $ # hello.cpp is really a C file
    67 $ ./a.out
    68 hello
    69
    70

    On the other hand, when using LLVMC as a linker to combine several C++

    71 object files you should provide the --linker option since it's
    72 impossible for LLVMC to choose the right linker in that case:

    73
    
                      
                    
    74 $ llvmc2 -c hello.cpp
    75 $ llvmc2 hello.o
    76 [A lot of link-time errors skipped]
    77 $ llvmc2 --linker=c++ hello.o
    78 $ ./a.out
    79 hello
    80
    81
    82
    83
    84

    LLVMC has some built-in options that can't be overridden in the

    85 configuration files:

    86
    87
  • -o FILE - Output file name.
  • 88
  • -x LANGUAGE - Specify the language of the following input files
  • 89 until the next -x option.
    90
  • -v - Enable verbose mode, i.e. print out all executed commands.
  • 91
  • --view-graph - Show a graphical representation of the compilation
  • 92 graph. Requires that you have dot and gv commands
    93 installed. Hidden option, useful for debugging.
    94
  • --write-graph - Write a compilation-graph.dot file in the
  • 95 current directory with the compilation graph description in the
    96 Graphviz format. Hidden option, useful for debugging.
    97
  • --save-temps - Write temporary files to the current directory
  • 98 and do not delete them on exit. Hidden option, useful for debugging.
    99
  • --help, --help-hidden, --version - These options have
  • 100 their standard meaning.
    101
    102
    103
    104
    105

    At the time of writing LLVMC does not support on-the-fly reloading of

    106 configuration, so to customize LLVMC you'll have to recompile the
    107 source code (which lives under $LLVM_DIR/tools/llvmc2). The
    108 default configuration files are Common.td (contains common
    109 definitions, don't forget to include it in your configuration
    110 files), Tools.td (tool descriptions) and Graph.td (compilation
    111 graph definition).

    112

    To compile LLVMC with your own configuration file (say,``MyGraph.td``),

    113 run make like this:

    114
    
                      
                    
    115 $ cd $LLVM_DIR/tools/llvmc2
    116 $ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
    117
    118

    This will build an executable named my_llvmc. There are also

    119 several sample configuration files in the llvmc2/examples
    120 subdirectory that should help to get you started.

    121

    Internally, LLVMC stores information about possible source

    122 transformations in form of a graph. Nodes in this graph represent
    123 tools, and edges between two nodes represent a transformation path. A
    124 special "root" node is used to mark entry points for the
    125 transformations. LLVMC also assigns a weight to each edge (more on
    126 this later) to choose between several alternative edges.

    127

    The definition of the compilation graph (see file Graph.td) is

    128 just a list of edges:

    129
    
                      
                    
    130 def CompilationGraph : CompilationGraph<[
    131 Edge<root, llvm_gcc_c>,
    132 Edge<root, llvm_gcc_assembler>,
    133 ...
    134
    135 Edge<llvm_gcc_c, llc>,
    136 Edge<llvm_gcc_cpp, llc>,
    137 ...
    138
    139 OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
    140 OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
    141 ...
    142
    143 OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
    144 (case (input_languages_contain "c++"), (inc_weight),
    145 (or (parameter_equals "linker", "g++"),
    146 (parameter_equals "linker", "c++")), (inc_weight))>,
    147 ...
    148
    149 ]>;
    150
    151

    As you can see, the edges can be either default or optional, where

    152 optional edges are differentiated by sporting a case expression
    153 used to calculate the edge's weight.

    154

    The default edges are assigned a weight of 1, and optional edges get a

    155 weight of 0 + 2*N where N is the number of tests that evaluated to
    156 true in the case expression. It is also possible to provide an
    157 integer parameter to inc_weight and dec_weight - in this case,
    158 the weight is increased (or decreased) by the provided value instead
    159 of the default 2.

    160

    When passing an input file through the graph, LLVMC picks the edge

    161 with the maximum weight. To avoid ambiguity, there should be only one
    162 default edge between two nodes (with the exception of the root node,
    163 which gets a special treatment - there you are allowed to specify one
    164 default edge per language).

    165

    To get a visual representation of the compilation graph (useful for

    166 debugging), run llvmc2 --view-graph. You will need dot and
    167 gsview installed for this to work properly.

    168
    169
    170
    171

    As was said earlier, nodes in the compilation graph represent tools,

    172 which are described separately. A tool definition looks like this
    173 (taken from the Tools.td file):

    174
    
                      
                    
    175 def llvm_gcc_cpp : Tool<[
    176 (in_language "c++"),
    177 (out_language "llvm-assembler"),
    178 (output_suffix "bc"),
    179 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
    180 (sink)
    181 ]>;
    182
    183

    This defines a new tool called llvm_gcc_cpp, which is an alias for

    184 llvm-g++. As you can see, a tool definition is just a list of
    185 properties; most of them should be self-explanatory. The sink
    186 property means that this tool should be passed all command-line
    187 options that lack explicit descriptions.

    188

    The complete list of the currently implemented tool properties follows:

    189
    190
  • Possible tool properties:
  • 191
  • in_language - input language name.
  • 192
  • out_language - output language name.
  • 193
  • output_suffix - output file suffix.
  • 194
  • cmd_line - the actual command used to run the tool. You can
  • 195 use $INFILE and $OUTFILE variables, output redirection
    196 with >, hook invocations ($CALL), environment variables
    197 (via $ENV) and the case construct (more on this below).
    198
  • join - this tool is a "join node" in the graph, i.e. it gets a
  • 199 list of input files and joins them together. Used for linkers.
    200
  • sink - all command-line options that are not handled by other
  • 201 tools are passed to this tool.
    202
    203
    204
    205

    The next tool definition is slightly more complex:

    206
    
                      
                    
    207 def llvm_gcc_linker : Tool<[
    208 (in_language "object-code"),
    209 (out_language "executable"),
    210 (output_suffix "out"),
    211 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
    212 (join),
    213 (prefix_list_option "L", (forward),
    214 (help "add a directory to link path")),
    215 (prefix_list_option "l", (forward),
    216 (help "search a library when linking")),
    217 (prefix_list_option "Wl", (unpack_values),
    218 (help "pass options to linker"))
    219 ]>;
    220
    221

    This tool has a "join" property, which means that it behaves like a

    222 linker. This tool also defines several command-line options: -l,
    223 -L and -Wl which have their usual meaning. An option has two
    224 attributes: a name and a (possibly empty) list of properties. All
    225 currently implemented option types and properties are described below:

    226
    227
  • Possible option types:

  • 228
    229
    230
  • switch_option - a simple boolean switch, for example -time.
  • 231
  • parameter_option - option that takes an argument, for example
  • 232 -std=c99;
    233
  • parameter_list_option - same as the above, but more than one
  • 234 occurence of the option is allowed.
    235
  • prefix_option - same as the parameter_option, but the option name
  • 236 and parameter value are not separated.
    237
  • prefix_list_option - same as the above, but more than one
  • 238 occurence of the option is allowed; example: -lm -lpthread.
    239
  • alias_option - a special option type for creating
  • 240 aliases. Unlike other option types, aliases are not allowed to
    241 have any properties besides the aliased option name. Usage
    242 example: (alias_option "preprocess", "E")
    243
    244
    245
    246
  • Possible option properties:

  • 247
    248
    249
  • append_cmd - append a string to the tool invocation command.
  • 250
  • forward - forward this option unchanged.
  • 251
  • output_suffix - modify the output suffix of this
  • 252 tool. Example : (switch "E", (output_suffix "i").
    253
  • stop_compilation - stop compilation after this phase.
  • 254
  • unpack_values - used for for splitting and forwarding
  • 255 comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is
    256 converted to -foo=bar -baz and appended to the tool invocation
    257 command.
    258
  • help - help string associated with this option. Used for
  • 259 --help output.
    260
  • required - this option is obligatory.
  • 261
    262
    263
    264
    265
    266
    267
    268

    It can be handy to have all information about options gathered in a

    269 single place to provide an overview. This can be achieved by using a
    270 so-called OptionList:

    271
    
                      
                    
    272 def Options : OptionList<[
    273 (switch_option "E", (help "Help string")),
    274 (alias_option "quiet", "q")
    275 ...
    276 ]>;
    277
    278

    OptionList is also a good place to specify option aliases.

    279

    Tool-specific option properties like append_cmd have (obviously)

    280 no meaning in the context of OptionList, so the only properties
    281 allowed there are help and required.

    282

    Option lists are used at the file scope. See file

    283 examples/Clang.td for an example of OptionList usage.

    284
    285
    286
    287

    Normally, LLVMC executes programs from the system PATH. Sometimes,

    288 this is not sufficient: for example, we may want to specify tool names
    289 in the configuration file. This can be achieved via the mechanism of
    290 hooks - to compile LLVMC with your hooks, just drop a .cpp file into
    291 tools/llvmc2 directory. Hooks should live in the hooks
    292 namespace and have the signature std::string hooks::MyHookName
    293 (void). They can be used from the cmd_line tool property:

    294
    
                      
                    
    295 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
    296
    297

    It is also possible to use environment variables in the same manner:

    298
    
                      
                    
    299 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
    300
    301

    To change the command line string based on user-provided options use

    302 the case expression (documented below):

    303
    
                      
                    
    304 (cmd_line
    305 (case
    306 (switch_on "E"),
    307 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
    308 (default),
    309 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
    310
    311
    312
    313
    314

    The 'case' construct can be used to calculate weights of the optional

    315 edges and to choose between several alternative command line strings
    316 in the cmd_line tool property. It is designed after the
    317 similarly-named construct in functional languages and takes the form
    318 (case (test_1), statement_1, (test_2), statement_2, ... (test_N),
    319 statement_N). The statements are evaluated only if the corresponding
    320 tests evaluate to true.

    321

    Examples:

    322
    
                      
                    
    323 // Increases edge weight by 5 if "-A" is provided on the
    324 // command-line, and by 5 more if "-B" is also provided.
    325 (case
    326 (switch_on "A"), (inc_weight 5),
    327 (switch_on "B"), (inc_weight 5))
    328
    329 // Evaluates to "cmdline1" if option "-A" is provided on the
    330 // command line, otherwise to "cmdline2"
    331 (case
    332 (switch_on "A"), "cmdline1",
    333 (switch_on "B"), "cmdline2",
    334 (default), "cmdline3")
    335
    336

    Note the slight difference in 'case' expression handling in contexts

    337 of edge weights and command line specification - in the second example
    338 the value of the "B" switch is never checked when switch "A" is
    339 enabled, and the whole expression always evaluates to "cmdline1" in
    340 that case.

    341

    Case expressions can also be nested, i.e. the following is legal:

    342
    
                      
                    
    343 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
    344 (default), ...)
    345
    346

    You should, however, try to avoid doing that because it hurts

    347 readability. It is usually better to split tool descriptions and/or
    348 use TableGen inheritance instead.

    349
    350
  • Possible tests are:
  • 351
  • switch_on - Returns true if a given command-line option is
  • 352 provided by the user. Example: (switch_on "opt"). Note that
    353 you have to define all possible command-line options separately in
    354 the tool descriptions. See the next doc_text for the discussion of
    355 different kinds of command-line options.
    356
  • parameter_equals - Returns true if a command-line parameter equals
  • 357 a given value. Example: (parameter_equals "W", "all").
    358
  • element_in_list - Returns true if a command-line parameter list
  • 359 includes a given value. Example: (parameter_in_list "l", "pthread").
    360
  • input_languages_contain - Returns true if a given language
  • 361 belongs to the current input language set. Example:
    362 `(input_languages_contain "c++").
    363
  • in_language - Evaluates to true if the language of the input
  • 364 file equals to the argument. Valid only when using case
    365 expression in a cmd_line tool property. Example:
    366 `(in_language "c++").
    367
  • not_empty - Returns true if a given option (which should be
  • 368 either a parameter or a parameter list) is set by the
    369 user. Example: `(not_empty "o").
    370
  • default - Always evaluates to true. Should always be the last
  • 371 test in the case expression.
    372
  • and - A standard logical combinator that returns true iff all
  • 373 of its arguments return true. Used like this: (and (test1),
    374 (test2), ... (testN)). Nesting of and and or is allowed,
    375 but not encouraged.
    376
  • or - Another logical combinator that returns true only if any
  • 377 one of its arguments returns true. Example: (or (test1),
    378 (test2), ... (testN)).
    379
    380
    381
    382
    383
    384
    385

    One last thing that you will need to modify when adding support for a

    386 new language to LLVMC is the language map, which defines mappings from
    387 file extensions to language names. It is used to choose the proper
    388 toolchain(s) for a given input file set. Language map definition is
    389 located in the file Tools.td and looks like this:

    390
    
                      
                    
    391 def LanguageMap : LanguageMap<
    392 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
    393 LangToSuffixes<"c", ["c"]>,
    394 ...
    395 ]>;
    396
    397
    398
    399
    400
    401
    402
    403
    [1]TableGen Fundamentals
    404 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html
    405
    406
    407
    408
    409
    410
    411
    412 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" />
    413
    414 src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" />
    415 The LLVM Compiler Infrastructure
    416 Last modified: $Date$
    818417
    819
    821418
    822419
    577577 (e.g. by passing things in registers). This calling convention allows the
    578578 target to use whatever tricks it wants to produce fast code for the target,
    579579 without having to conform to an externally specified ABI. Implementations of
    580 this convention should allow arbitrary tail call optimization to be supported.
    581 This calling convention does not support varargs and requires the prototype of
    582 all callees to exactly match the prototype of the function definition.
    580 this convention should allow arbitrary
    581 tail call optimization to be
    582 supported. This calling convention does not support varargs and requires the
    583 prototype of all callees to exactly match the prototype of the function
    584 definition.
    583585
    584586
    585587
    "coldcc" - The cold calling convention: