llvm.org GIT mirror llvm / 808ce5f
Remove the 'simple jit' tutorial as it wasn't really being maintained and its material is covered by the Kaleidoscope tutorial. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90111 91177308-0d34-0410-b5e6-96231b3b80d8 Nick Lewycky 9 years ago
4 changed file(s) with 0 addition(s) and 417 deletion(s). Raw diff Collapse all Expand all
+0
-207
docs/tutorial/JITTutorial1.html less more
None
1 "http://www.w3.org/TR/html4/strict.dtd">
2
3
4
5 LLVM Tutorial 1: A First Function
6
7
8
9 content="LLVM Tutorial 1: A First Function.">
10
11
12
13
14
15
LLVM Tutorial 1: A First Function
16
17
18

Written by Owen Anderson

19
20
21
22
23
24
25
26
27

For starters, let's consider a relatively straightforward function that takes three integer parameters and returns an arithmetic combination of them. This is nice and simple, especially since it involves no control flow:

28
29
30

                  
                
31 int mul_add(int x, int y, int z) {
32 return x * y + z;
33 }
34
35
36
37

As a preview, the LLVM IR we’re going to end up generating for this function will look like:

38
39
40

                  
                
41 define i32 @mul_add(i32 %x, i32 %y, i32 %z) {
42 entry:
43 %tmp = mul i32 %x, %y
44 %tmp2 = add i32 %tmp, %z
45 ret i32 %tmp2
46 }
47
48
49
50

If you're unsure what the above code says, skim through the LLVM Language Reference Manual and convince yourself that the above LLVM IR is actually equivalent to the original function. Once you’re satisfied with that, let's move on to actually generating it programmatically!

51
52

Of course, before we can start, we need to #include the appropriate LLVM header files:

53
54
55

                  
                
56 #include "llvm/Module.h"
57 #include "llvm/Function.h"
58 #include "llvm/PassManager.h"
59 #include "llvm/CallingConv.h"
60 #include "llvm/Analysis/Verifier.h"
61 #include "llvm/Assembly/PrintModulePass.h"
62 #include "llvm/Support/IRBuilder.h"
63 #include "llvm/Support/raw_ostream.h"
64
65
66
67

Now, let's get started on our real program. Here's what our basic main() will look like:

68
69
70

                  
                
71 using namespace llvm;
72
73 Module* makeLLVMModule();
74
75 int main(int argc, char**argv) {
76 Module* Mod = makeLLVMModule();
77
78 verifyModule(*Mod, PrintMessageAction);
79
80 PassManager PM;
81 PM.add(createPrintModulePass(&outs()));
82 PM.run(*Mod);
83
84 delete Mod;
85 return 0;
86 }
87
88
89
90

The first segment is pretty simple: it creates an LLVM “module.” In LLVM, a module represents a single unit of code that is to be processed together. A module contains things like global variables, function declarations, and implementations. Here we’ve declared a makeLLVMModule() function to do the real work of creating the module. Don’t worry, we’ll be looking at that one next!

91
92

The second segment runs the LLVM module verifier on our newly created module. While this probably isn’t really necessary for a simple module like this one, it's always a good idea, especially if you’re generating LLVM IR based on some input. The verifier will print an error message if your LLVM module is malformed in any way.

93
94

Finally, we instantiate an LLVM PassManager and run

95 the PrintModulePass on our module. LLVM uses an explicit pass
96 infrastructure to manage optimizations and various other things.
97 A PassManager, as should be obvious from its name, manages passes:
98 it is responsible for scheduling them, invoking them, and ensuring the proper
99 disposal after we’re done with them. For this example, we’re just using a
100 trivial pass that prints out our module in textual form.

101
102

Now onto the interesting part: creating and populating a module. Here's the

103 first chunk of our makeLLVMModule():

104
105
106

                  
                
107 Module* makeLLVMModule() {
108 // Module Construction
109 Module* mod = new Module("test", getGlobalContext());
110
111
112
113

Exciting, isn’t it!? All we’re doing here is instantiating a module and giving it a name. The name isn’t particularly important unless you’re going to be dealing with multiple modules at once.

114
115
116

                  
                
117 Constant* c = mod->getOrInsertFunction("mul_add",
118 /*ret type*/ IntegerType::get(32),
119 /*args*/ IntegerType::get(32),
120 IntegerType::get(32),
121 IntegerType::get(32),
122 /*varargs terminated with null*/ NULL);
123
124 Function* mul_add = cast<Function>(c);
125 mul_add->setCallingConv(CallingConv::C);
126
127
128
129

We construct our Function by calling getOrInsertFunction() on our module, passing in the name, return type, and argument types of the function. In the case of our mul_add function, that means one 32-bit integer for the return value and three 32-bit integers for the arguments.

130
131

You'll notice that getOrInsertFunction() doesn't actually return a Function*. This is because getOrInsertFunction() will return a cast of the existing function if the function already existed with a different prototype. Since we know that there's not already a mul_add function, we can safely just cast c to a Function*.

132
133

In addition, we set the calling convention for our new function to be the C

134 calling convention. This isn’t strictly necessary, but it ensures that our new
135 function will interoperate properly with C code, which is a good thing.

136
137
138

                  
                
139 Function::arg_iterator args = mul_add->arg_begin();
140 Value* x = args++;
141 x->setName("x");
142 Value* y = args++;
143 y->setName("y");
144 Value* z = args++;
145 z->setName("z");
146
147
148
149

While we’re setting up our function, let's also give names to the parameters. This also isn’t strictly necessary (LLVM will generate names for them if you don’t specify them), but it’ll make looking at our output somewhat more pleasant. To name the parameters, we iterate over the arguments of our function and call setName() on them. We’ll also keep the pointer to x, y, and z around, since we’ll need them when we get around to creating instructions.

150
151

Great! We have a function now. But what good is a function if it has no body? Before we start working on a body for our new function, we need to recall some details of the LLVM IR. The IR, being an abstract assembly language, represents control flow using jumps (we call them branches), both conditional and unconditional. The straight-line sequences of code between branches are called basic blocks, or just blocks. To create a body for our function, we fill it with blocks:

152
153
154

                  
                
155 BasicBlock* block = BasicBlock::Create(getGlobalContext(), "entry", mul_add);
156 IRBuilder<> builder(block);
157
158
159
160

We create a new basic block, as you might expect, by calling its constructor. All we need to tell it is its name and the function to which it belongs. In addition, we’re creating an IRBuilder object, which is a convenience interface for creating instructions and appending them to the end of a block. Instructions can be created through their constructors as well, but some of their interfaces are quite complicated. Unless you need a lot of control, using IRBuilder will make your life simpler.

161
162
163

                  
                
164 Value* tmp = builder.CreateBinOp(Instruction::Mul,
165 x, y, "tmp");
166 Value* tmp2 = builder.CreateBinOp(Instruction::Add,
167 tmp, z, "tmp2");
168
169 builder.CreateRet(tmp2);
170
171 return mod;
172 }
173
174
175
176

The final step in creating our function is to create the instructions that make it up. Our mul_add function is composed of just three instructions: a multiply, an add, and a return. IRBuilder gives us a simple interface for constructing these instructions and appending them to the “entry” block. Each of the calls to IRBuilder returns a Value* that represents the value yielded by the instruction. You’ll also notice that, above, x, y, and z are also Value*'s, so it's clear that instructions operate on Value*'s.

177
178

And that's it! Now you can compile and run your code, and get a wonderful textual print out of the LLVM IR we saw at the beginning. To compile, use the following command line as a guide:

179
180
181

                  
                
182 # c++ -g tut1.cpp `llvm-config --cxxflags --ldflags --libs core` -o tut1
183 # ./tut1
184
185
186
187

The llvm-config utility is used to obtain the necessary GCC-compatible compiler flags for linking with LLVM. For this example, we only need the 'core' library. We'll use others once we start adding optimizers and the JIT engine.

188
189 Next: A More Complicated Function
190
191
192
193
194
195
196 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!">
197
198 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!">
199
200 Owen Anderson
201 The LLVM Compiler Infrastructure
202 Last modified: $Date: 2009-07-21 11:05:13 -0700 (Tue, 21 Jul 2009) $
203
204
205
206
docs/tutorial/JITTutorial2-1.png less more
Binary diff not shown
+0
-200
docs/tutorial/JITTutorial2.html less more
None
1 "http://www.w3.org/TR/html4/strict.dtd">
2
3
4
5 LLVM Tutorial 2: A More Complicated Function
6
7
8
9 content="LLVM Tutorial 2: A More Complicated Function.">
10
11
12
13
14
15
LLVM Tutorial 2: A More Complicated Function
16
17
18

Written by Owen Anderson

19
20
21
22
23
24
25
26
27

Now that we understand the basics of creating functions in LLVM, let's move on to a more complicated example: something with control flow. As an example, let's consider Euclid's Greatest Common Denominator (GCD) algorithm:

28
29
30

                  
                
31 unsigned gcd(unsigned x, unsigned y) {
32 if(x == y) {
33 return x;
34 } else if(x < y) {
35 return gcd(x, y - x);
36 } else {
37 return gcd(x - y, y);
38 }
39 }
40
41
42
43

With this example, we'll learn how to create functions with multiple blocks and control flow, and how to make function calls within your LLVM code. For starters, consider the diagram below.

44
45
GCD CFG
46
47

This is a graphical representation of a program in LLVM IR. It places each basic block on a node of a graph and uses directed edges to indicate flow control. These blocks will be serialized when written to a text or bitcode file, but it is often useful conceptually to think of them as a graph. Again, if you are unsure about the code in the diagram, you should skim through the LLVM Language Reference Manual and convince yourself that it is, in fact, the GCD algorithm.

48
49

The first part of our code is practically the same as from the first tutorial. The same basic setup is required: creating a module, verifying it, and running the PrintModulePass on it. Even the first segment of makeLLVMModule() looks essentially the same, except that gcd takes one fewer parameter than mul_add.

50
51
52

                  
                
53 #include "llvm/Module.h"
54 #include "llvm/Function.h"
55 #include "llvm/PassManager.h"
56 #include "llvm/Analysis/Verifier.h"
57 #include "llvm/Assembly/PrintModulePass.h"
58 #include "llvm/Support/IRBuilder.h"
59 #include "llvm/Support/raw_ostream.h"
60
61 using namespace llvm;
62
63 Module* makeLLVMModule();
64
65 int main(int argc, char**argv) {
66 Module* Mod = makeLLVMModule();
67
68 verifyModule(*Mod, PrintMessageAction);
69
70 PassManager PM;
71 PM.add(createPrintModulePass(&outs()));
72 PM.run(*Mod);
73
74 delete Mod;
75 return 0;
76 }
77
78 Module* makeLLVMModule() {
79 Module* mod = new Module("tut2");
80
81 Constant* c = mod->getOrInsertFunction("gcd",
82 IntegerType::get(32),
83 IntegerType::get(32),
84 IntegerType::get(32),
85 NULL);
86 Function* gcd = cast<Function>(c);
87
88 Function::arg_iterator args = gcd->arg_begin();
89 Value* x = args++;
90 x->setName("x");
91 Value* y = args++;
92 y->setName("y");
93
94
95
96

Here, however, is where our code begins to diverge from the first tutorial. Because gcd has control flow, it is composed of multiple blocks interconnected by branching (br) instructions. For those familiar with assembly language, a block is similar to a labeled set of instructions. For those not familiar with assembly language, a block is basically a set of instructions that can be branched to and is executed linearly until the block is terminated by one of a small number of control flow instructions, such as br or ret.

97
98

Blocks correspond to the nodes in the diagram we looked at in the beginning of this tutorial. From the diagram, we can see that this function contains five blocks, so we'll go ahead and create them. Note that we're making use of LLVM's automatic name uniquing in this code sample, since we're giving two blocks the same name.

99
100
101

                  
                
102 BasicBlock* entry = BasicBlock::Create(getGlobalContext(), ("entry", gcd);
103 BasicBlock* ret = BasicBlock::Create(getGlobalContext(), ("return", gcd);
104 BasicBlock* cond_false = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);
105 BasicBlock* cond_true = BasicBlock::Create(getGlobalContext(), ("cond_true", gcd);
106 BasicBlock* cond_false_2 = BasicBlock::Create(getGlobalContext(), ("cond_false", gcd);
107
108
109
110

Now we're ready to begin generating code! We'll start with the entry block. This block corresponds to the top-level if-statement in the original C code, so we need to compare x and y. To achieve this, we perform an explicit comparison using ICmpEQ. ICmpEQ stands for an integer comparison for equality and returns a 1-bit integer result. This 1-bit result is then used as the input to a conditional branch, with ret as the true and cond_false as the false case.

111
112
113

                  
                
114 IRBuilder<> builder(entry);
115 Value* xEqualsY = builder.CreateICmpEQ(x, y, "tmp");
116 builder.CreateCondBr(xEqualsY, ret, cond_false);
117
118
119
120

Our next block, ret, is pretty simple: it just returns the value of x. Recall that this block is only reached if x == y, so this is the correct behavior. Notice that instead of creating a new IRBuilder for each block, we can use SetInsertPoint to retarget our existing one. This saves on construction and memory allocation costs.

121
122
123

                  
                
124 builder.SetInsertPoint(ret);
125 builder.CreateRet(x);
126
127
128
129

cond_false is a more interesting block: we now know that x

130 != y, so we must branch again to determine which of x
131 and y is larger. This is achieved using the ICmpULT
132 instruction, which stands for integer comparison for unsigned
133 less-than. In LLVM, integer types do not carry sign; a 32-bit integer
134 pseudo-register can be interpreted as signed or unsigned without casting.
135 Whether a signed or unsigned interpretation is desired is specified in the
136 instruction. This is why several instructions in the LLVM IR, such as integer
137 less-than, include a specifier for signed or unsigned.

138
139

Also note that we're again making use of LLVM's automatic name uniquing, this time at a register level. We've deliberately chosen to name every instruction "tmp" to illustrate that LLVM will give them all unique names without getting confused.

140
141
142

                  
                
143 builder.SetInsertPoint(cond_false);
144 Value* xLessThanY = builder.CreateICmpULT(x, y, "tmp");
145 builder.CreateCondBr(xLessThanY, cond_true, cond_false_2);
146
147
148
149

Our last two blocks are quite similar; they're both recursive calls to gcd with different parameters. To create a call instruction, we have to create a vector (or any other container with InputInterators) to hold the arguments. We then pass in the beginning and ending iterators for this vector.

150
151
152

                  
                
153 builder.SetInsertPoint(cond_true);
154 Value* yMinusX = builder.CreateSub(y, x, "tmp");
155 std::vector<Value*> args1;
156 args1.push_back(x);
157 args1.push_back(yMinusX);
158 Value* recur_1 = builder.CreateCall(gcd, args1.begin(), args1.end(), "tmp");
159 builder.CreateRet(recur_1);
160
161 builder.SetInsertPoint(cond_false_2);
162 Value* xMinusY = builder.CreateSub(x, y, "tmp");
163 std::vector<Value*> args2;
164 args2.push_back(xMinusY);
165 args2.push_back(y);
166 Value* recur_2 = builder.CreateCall(gcd, args2.begin(), args2.end(), "tmp");
167 builder.CreateRet(recur_2);
168
169 return mod;
170 }
171
172
173
174

And that's it! You can compile and execute your code in the same way as before, by doing:

175
176
177

                  
                
178 # c++ -g tut2.cpp `llvm-config --cxxflags --ldflags --libs core` -o tut2
179 # ./tut2
180
181
182
183
184
185
186
187
188
189 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!">
190
191 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!">
192
193 Owen Anderson
194 The LLVM Compiler Infrastructure
195 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
196
197
198
199
1414
LLVM Tutorial: Table of Contents
1515
1616
17
  • An Introduction to LLVM: Basic Concepts and Design
  • 18
  • Simple JIT Tutorials
  • 19
    20
  • A First Function
  • 21
  • A More Complicated Function
  • 22
  • Running Optimizations
  • 23
  • Reading and Writing Bitcode
  • 24
  • Invoking the JIT
  • 25
    26
    2717
  • Kaleidoscope: Implementing a Language with LLVM
  • 2818
    2919
  • Tutorial Introduction and the Lexer