llvm.org GIT mirror llvm / 23cf05a
Add Chapter 8 to the Kaleidoscope tutorial. This chapter adds a description of how to add debug information using DWARF and DIBuilder to the language. Thanks to David Blaikie for his assistance with this tutorial. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223671 91177308-0d34-0410-b5e6-96231b3b80d8 Eric Christopher 4 years ago
6 changed file(s) with 2215 addition(s) and 264 deletion(s). Raw diff Collapse all Expand all
None ======================================================
1 Kaleidoscope: Conclusion and other useful LLVM tidbits
2 ======================================================
0 =======================================================
1 Kaleidoscope: Extending the Language: Debug Information
2 =======================================================
33
44 .. contents::
55 :local:
66
7 Tutorial Conclusion
8 ===================
9
10 Welcome to the final chapter of the "`Implementing a language with
11 LLVM `_" tutorial. In the course of this tutorial, we have
12 grown our little Kaleidoscope language from being a useless toy, to
13 being a semi-interesting (but probably still useless) toy. :)
14
15 It is interesting to see how far we've come, and how little code it has
16 taken. We built the entire lexer, parser, AST, code generator, and an
17 interactive run-loop (with a JIT!) by-hand in under 700 lines of
18 (non-comment/non-blank) code.
19
20 Our little language supports a couple of interesting features: it
21 supports user defined binary and unary operators, it uses JIT
22 compilation for immediate evaluation, and it supports a few control flow
23 constructs with SSA construction.
24
25 Part of the idea of this tutorial was to show you how easy and fun it
26 can be to define, build, and play with languages. Building a compiler
27 need not be a scary or mystical process! Now that you've seen some of
28 the basics, I strongly encourage you to take the code and hack on it.
29 For example, try adding:
30
31 - **global variables** - While global variables have questional value
32 in modern software engineering, they are often useful when putting
33 together quick little hacks like the Kaleidoscope compiler itself.
34 Fortunately, our current setup makes it very easy to add global
35 variables: just have value lookup check to see if an unresolved
36 variable is in the global variable symbol table before rejecting it.
37 To create a new global variable, make an instance of the LLVM
38 ``GlobalVariable`` class.
39 - **typed variables** - Kaleidoscope currently only supports variables
40 of type double. This gives the language a very nice elegance, because
41 only supporting one type means that you never have to specify types.
42 Different languages have different ways of handling this. The easiest
43 way is to require the user to specify types for every variable
44 definition, and record the type of the variable in the symbol table
45 along with its Value\*.
46 - **arrays, structs, vectors, etc** - Once you add types, you can start
47 extending the type system in all sorts of interesting ways. Simple
48 arrays are very easy and are quite useful for many different
49 applications. Adding them is mostly an exercise in learning how the
50 LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
51 works: it is so nifty/unconventional, it `has its own
52 FAQ <../GetElementPtr.html>`_! If you add support for recursive types
53 (e.g. linked lists), make sure to read the `section in the LLVM
54 Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
55 describes how to construct them.
56 - **standard runtime** - Our current language allows the user to access
57 arbitrary external functions, and we use it for things like "printd"
58 and "putchard". As you extend the language to add higher-level
59 constructs, often these constructs make the most sense if they are
60 lowered to calls into a language-supplied runtime. For example, if
61 you add hash tables to the language, it would probably make sense to
62 add the routines to a runtime, instead of inlining them all the way.
63 - **memory management** - Currently we can only access the stack in
64 Kaleidoscope. It would also be useful to be able to allocate heap
65 memory, either with calls to the standard libc malloc/free interface
66 or with a garbage collector. If you would like to use garbage
67 collection, note that LLVM fully supports `Accurate Garbage
68 Collection <../GarbageCollection.html>`_ including algorithms that
69 move objects and need to scan/update the stack.
70 - **debugger support** - LLVM supports generation of `DWARF Debug
71 info <../SourceLevelDebugging.html>`_ which is understood by common
72 debuggers like GDB. Adding support for debug info is fairly
73 straightforward. The best way to understand it is to compile some
74 C/C++ code with "``clang -g -O0``" and taking a look at what it
75 produces.
76 - **exception handling support** - LLVM supports generation of `zero
77 cost exceptions <../ExceptionHandling.html>`_ which interoperate with
78 code compiled in other languages. You could also generate code by
79 implicitly making every function return an error value and checking
80 it. You could also make explicit use of setjmp/longjmp. There are
81 many different ways to go here.
82 - **object orientation, generics, database access, complex numbers,
83 geometric programming, ...** - Really, there is no end of crazy
84 features that you can add to the language.
85 - **unusual domains** - We've been talking about applying LLVM to a
86 domain that many people are interested in: building a compiler for a
87 specific language. However, there are many other domains that can use
88 compiler technology that are not typically considered. For example,
89 LLVM has been used to implement OpenGL graphics acceleration,
90 translate C++ code to ActionScript, and many other cute and clever
91 things. Maybe you will be the first to JIT compile a regular
92 expression interpreter into native code with LLVM?
93
94 Have fun - try doing something crazy and unusual. Building a language
95 like everyone else always has, is much less fun than trying something a
96 little crazy or off the wall and seeing how it turns out. If you get
97 stuck or want to talk about it, feel free to email the `llvmdev mailing
98 list `_: it has lots
99 of people who are interested in languages and are often willing to help
100 out.
101
102 Before we end this tutorial, I want to talk about some "tips and tricks"
103 for generating LLVM IR. These are some of the more subtle things that
104 may not be obvious, but are very useful if you want to take advantage of
105 LLVM's capabilities.
106
107 Properties of the LLVM IR
108 =========================
109
110 We have a couple common questions about code in the LLVM IR form - lets
111 just get these out of the way right now, shall we?
112
113 Target Independence
114 -------------------
115
116 Kaleidoscope is an example of a "portable language": any program written
117 in Kaleidoscope will work the same way on any target that it runs on.
118 Many other languages have this property, e.g. lisp, java, haskell,
119 javascript, python, etc (note that while these languages are portable,
120 not all their libraries are).
121
122 One nice aspect of LLVM is that it is often capable of preserving target
123 independence in the IR: you can take the LLVM IR for a
124 Kaleidoscope-compiled program and run it on any target that LLVM
125 supports, even emitting C code and compiling that on targets that LLVM
126 doesn't support natively. You can trivially tell that the Kaleidoscope
127 compiler generates target-independent code because it never queries for
128 any target-specific information when generating code.
129
130 The fact that LLVM provides a compact, target-independent,
131 representation for code gets a lot of people excited. Unfortunately,
132 these people are usually thinking about C or a language from the C
133 family when they are asking questions about language portability. I say
134 "unfortunately", because there is really no way to make (fully general)
135 C code portable, other than shipping the source code around (and of
136 course, C source code is not actually portable in general either - ever
137 port a really old application from 32- to 64-bits?).
138
139 The problem with C (again, in its full generality) is that it is heavily
140 laden with target specific assumptions. As one simple example, the
141 preprocessor often destructively removes target-independence from the
142 code when it processes the input text:
143
144 .. code-block:: c
145
146 #ifdef __i386__
147 int X = 1;
148 #else
149 int X = 42;
150 #endif
151
152 While it is possible to engineer more and more complex solutions to
153 problems like this, it cannot be solved in full generality in a way that
154 is better than shipping the actual source code.
155
156 That said, there are interesting subsets of C that can be made portable.
157 If you are willing to fix primitive types to a fixed size (say int =
158 32-bits, and long = 64-bits), don't care about ABI compatibility with
159 existing binaries, and are willing to give up some other minor features,
160 you can have portable code. This can make sense for specialized domains
161 such as an in-kernel language.
162
163 Safety Guarantees
164 -----------------
165
166 Many of the languages above are also "safe" languages: it is impossible
167 for a program written in Java to corrupt its address space and crash the
168 process (assuming the JVM has no bugs). Safety is an interesting
169 property that requires a combination of language design, runtime
170 support, and often operating system support.
171
172 It is certainly possible to implement a safe language in LLVM, but LLVM
173 IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
174 casts, use after free bugs, buffer over-runs, and a variety of other
175 problems. Safety needs to be implemented as a layer on top of LLVM and,
176 conveniently, several groups have investigated this. Ask on the `llvmdev
177 mailing list `_ if
178 you are interested in more details.
179
180 Language-Specific Optimizations
181 -------------------------------
182
183 One thing about LLVM that turns off many people is that it does not
184 solve all the world's problems in one system (sorry 'world hunger',
185 someone else will have to solve you some other day). One specific
186 complaint is that people perceive LLVM as being incapable of performing
187 high-level language-specific optimization: LLVM "loses too much
188 information".
189
190 Unfortunately, this is really not the place to give you a full and
191 unified version of "Chris Lattner's theory of compiler design". Instead,
192 I'll make a few observations:
193
194 First, you're right that LLVM does lose information. For example, as of
195 this writing, there is no way to distinguish in the LLVM IR whether an
196 SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
197 than debug info). Both get compiled down to an 'i32' value and the
198 information about what it came from is lost. The more general issue
199 here, is that the LLVM type system uses "structural equivalence" instead
200 of "name equivalence". Another place this surprises people is if you
201 have two types in a high-level language that have the same structure
202 (e.g. two different structs that have a single int field): these types
203 will compile down into a single LLVM type and it will be impossible to
204 tell what it came from.
205
206 Second, while LLVM does lose information, LLVM is not a fixed target: we
207 continue to enhance and improve it in many different ways. In addition
208 to adding new features (LLVM did not always support exceptions or debug
209 info), we also extend the IR to capture important information for
210 optimization (e.g. whether an argument is sign or zero extended,
211 information about pointers aliasing, etc). Many of the enhancements are
212 user-driven: people want LLVM to include some specific feature, so they
213 go ahead and extend it.
214
215 Third, it is *possible and easy* to add language-specific optimizations,
216 and you have a number of choices in how to do it. As one trivial
217 example, it is easy to add language-specific optimization passes that
218 "know" things about code compiled for a language. In the case of the C
219 family, there is an optimization pass that "knows" about the standard C
220 library functions. If you call "exit(0)" in main(), it knows that it is
221 safe to optimize that into "return 0;" because C specifies what the
222 'exit' function does.
223
224 In addition to simple library knowledge, it is possible to embed a
225 variety of other language-specific information into the LLVM IR. If you
226 have a specific need and run into a wall, please bring the topic up on
227 the llvmdev list. At the very worst, you can always treat LLVM as if it
228 were a "dumb code generator" and implement the high-level optimizations
229 you desire in your front-end, on the language-specific AST.
230
231 Tips and Tricks
232 ===============
233
234 There is a variety of useful tips and tricks that you come to know after
235 working on/with LLVM that aren't obvious at first glance. Instead of
236 letting everyone rediscover them, this section talks about some of these
237 issues.
238
239 Implementing portable offsetof/sizeof
240 -------------------------------------
241
242 One interesting thing that comes up, if you are trying to keep the code
243 generated by your compiler "target independent", is that you often need
244 to know the size of some LLVM type or the offset of some field in an
245 llvm structure. For example, you might need to pass the size of a type
246 into a function that allocates memory.
247
248 Unfortunately, this can vary widely across targets: for example the
249 width of a pointer is trivially target-specific. However, there is a
250 `clever way to use the getelementptr
251 instruction `_
252 that allows you to compute this in a portable way.
253
254 Garbage Collected Stack Frames
255 ------------------------------
256
257 Some languages want to explicitly manage their stack frames, often so
258 that they are garbage collected or to allow easy implementation of
259 closures. There are often better ways to implement these features than
260 explicit stack frames, but `LLVM does support
261 them, `_
262 if you want. It requires your front-end to convert the code into
263 `Continuation Passing
264 Style `_ and
265 the use of tail calls (which LLVM also supports).
266
7 Chapter 8 Introduction
8 ======================
9
10 Welcome to Chapter 8 of the "`Implementing a language with
11 LLVM `_" tutorial. In chapters 1 through 7, we've built a
12 decent little programming language with functions and variables.
13 What happens if something goes wrong though, how do you debug your
14 program?
15
16 Source level debugging uses formatted data that helps a debugger
17 translate from binary and the state of the machine back to the
18 source that the programmer wrote. In LLVM we generally use a format
19 called `DWARF `_. DWARF is a compact encoding
20 that represents types, source locations, and variable locations.
21
22 The short summary of this chapter is that we'll go through the
23 various things you have to add to a programming language to
24 support debug info, and how you translate that into DWARF.
25
26 Caveat: For now we can't debug via the JIT, so we'll need to compile
27 our program down to something small and standalone. As part of this
28 we'll make a few modifications to the running of the language and
29 how programs are compiled. This means that we'll have a source file
30 with a simple program written in Kaleidoscope rather than the
31 interactive JIT. It does involve a limitation that we can only
32 have one "top level" command at a time to reduce the number of
33 changes necessary.
34
35 Here's the sample program we'll be compiling:
36
37 .. code-block:: python
38
39 def fib(x)
40 if x < 3 then
41 1
42 else
43 fib(x-1)+fib(x-2);
44
45 fib(10)
46
47
48 Why is this a hard problem?
49 ===========================
50
51 Debug information is a hard problem for a few different reasons - mostly
52 centered around optimized code. First, optimization makes keeping source
53 locations more difficult. In LLVM IR we keep the original source location
54 for each IR level instruction on the instruction. Optimization passes
55 should keep the source locations for newly created instructions, but merged
56 instructions only get to keep a single location - this can cause jumping
57 around when stepping through optimized programs. Secondly, optimization
58 can move variables in ways that are either optimized out, shared in memory
59 with other variables, or difficult to track. For the purposes of this
60 tutorial we're going to avoid optimization (as you'll see with one of the
61 next sets of patches).
62
63 Ahead-of-Time Compilation Mode
64 ==============================
65
66 To highlight only the aspects of adding debug information to a source
67 language without needing to worry about the complexities of JIT debugging
68 we're going to make a few changes to Kaleidoscope to support compiling
69 the IR emitted by the front end into a simple standalone program that
70 you can execute, debug, and see results.
71
72 First we make our anonymous function that contains our top level
73 statement be our "main":
74
75 .. code-block:: udiff
76
77 - PrototypeAST *Proto = new PrototypeAST("", std::vector());
78 + PrototypeAST *Proto = new PrototypeAST("main", std::vector());
79
80 just with the simple change of giving it a name.
81
82 Then we're going to remove the command line code wherever it exists:
83
84 .. code-block:: udiff
85
86 @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
87 /// top ::= definition | external | expression | ';'
88 static void MainLoop() {
89 while (1) {
90 - fprintf(stderr, "ready> ");
91 switch (CurTok) {
92 case tok_eof:
93 return;
94 @@ -1184,7 +1183,6 @@ int main() {
95 BinopPrecedence['*'] = 40; // highest.
96
97 // Prime the first token.
98 - fprintf(stderr, "ready> ");
99 getNextToken();
100
101 Lastly we're going to disable all of the optimization passes and the JIT so
102 that the only thing that happens after we're done parsing and generating
103 code is that the llvm IR goes to standard error:
104
105 .. code-block:: udiff
106
107 @@ -1108,17 +1108,8 @@ static void HandleExtern() {
108 static void HandleTopLevelExpression() {
109 // Evaluate a top-level expression into an anonymous function.
110 if (FunctionAST *F = ParseTopLevelExpr()) {
111 - if (Function *LF = F->Codegen()) {
112 - // We're just doing this to make sure it executes.
113 - TheExecutionEngine->finalizeObject();
114 - // JIT the function, returning a function pointer.
115 - void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
116 -
117 - // Cast it to the right type (takes no arguments, returns a double) so we
118 - // can call it as a native function.
119 - double (*FP)() = (double (*)())(intptr_t)FPtr;
120 - // Ignore the return value for this.
121 - (void)FP;
122 + if (!F->Codegen()) {
123 + fprintf(stderr, "Error generating code for top level expr");
124 }
125 } else {
126 // Skip token for error recovery.
127 @@ -1439,11 +1459,11 @@ int main() {
128 // target lays out data structures.
129 TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
130 OurFPM.add(new DataLayoutPass());
131 +#if 0
132 OurFPM.add(createBasicAliasAnalysisPass());
133 // Promote allocas to registers.
134 OurFPM.add(createPromoteMemoryToRegisterPass());
135 @@ -1218,7 +1210,7 @@ int main() {
136 OurFPM.add(createGVNPass());
137 // Simplify the control flow graph (deleting unreachable blocks, etc).
138 OurFPM.add(createCFGSimplificationPass());
139 -
140 + #endif
141 OurFPM.doInitialization();
142
143 // Set the global so the code gen can use this.
144
145 This relatively small set of changes get us to the point that we can compile
146 our piece of Kaleidoscope language down to an executable program via this
147 command line:
148
149 .. code-block:: bash
150
151 Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
152
153 which gives an a.out/a.exe in the current working directory.
154
155 Compile Unit
156 ============
157
158 The top level container for a section of code in DWARF is a compile unit.
159 This contains the type and function data for an individual translation unit
160 (read: one file of source code). So the first thing we need to do is
161 construct one for our fib.ks file.
162
163 DWARF Emission Setup
164 ====================
165
166 Similar to the ``IRBuilder`` class we have a
167 ```DIBuilder`` `_ class
168 that helps in constructing debug metadata for an llvm IR file. It
169 corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
170 Using it does require that you be more familiar with DWARF terminology than
171 you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
172 read through the general documentation on the
173 ```Metadata Format`` `_ it
174 should be a little more clear. We'll be using this class to construct all
175 of our IR level descriptions. Construction for it takes a module so we
176 need to construct it shortly after we construct our module. We've left it
177 as a global static variable to make it a bit easier to use.
178
179 Next we're going to create a small container to cache some of our frequent
180 data. The first will be our compile unit, but we'll also write a bit of
181 code for our one type since we won't have to worry about multiple typed
182 expressions:
183
184 .. code-block:: c++
185
186 static DIBuilder *DBuilder;
187
188 struct DebugInfo {
189 DICompileUnit TheCU;
190 DIType DblTy;
191
192 DIType getDoubleTy();
193 } KSDbgInfo;
194
195 DIType DebugInfo::getDoubleTy() {
196 if (DblTy.isValid())
197 return DblTy;
198
199 DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
200 return DblTy;
201 }
202
203 And then later on in ``main`` when we're constructing our module:
204
205 .. code-block:: c++
206
207 DBuilder = new DIBuilder(*TheModule);
208
209 KSDbgInfo.TheCU = DBuilder->createCompileUnit(
210 dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
211
212 There are a couple of things to note here. First, while we're producing a
213 compile unit for a language called Kaleidoscope we used the language
214 constant for C. This is because a debugger wouldn't necessarily understand
215 the calling conventions or default ABI for a language it doesn't recognize
216 and we follow the C ABI in our llvm code generation so it's the closest
217 thing to accurate. This ensures we can actually call functions from the
218 debugger and have them execute. Secondly, you'll see the "fib.ks" in the
219 call to ``createCompileUnit``. This is a default hard coded value since
220 we're using shell redirection to put our source into the Kaleidoscope
221 compiler. In a usual front end you'd have an input file name and it would
222 go there.
223
224 One last thing as part of emitting debug information via DIBuilder is that
225 we need to "finalize" the debug information. The reasons are part of the
226 underlying API for DIBuilder, but make sure you do this near the end of
227 main:
228
229 .. code-block:: c++
230
231 DBuilder->finalize();
232
233 before you dump out the module.
234
235 Functions
236 =========
237
238 Now that we have our ``Compile Unit`` and our source locations, we can add
239 function definitions to the debug info. So in ``PrototypeAST::Codegen`` we
240 add a few lines of code to describe a context for our subprogram, in this
241 case the "File", and the actual definition of the function itself.
242
243 So the context:
244
245 .. code-block:: c++
246
247 DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
248 KSDbgInfo.TheCU.getDirectory());
249
250 giving us a DIFile and asking the ``Compile Unit`` we created above for the
251 directory and filename where we are currently. Then, for now, we use some
252 source locations of 0 (since our AST doesn't currently have source location
253 information) and construct our function definition:
254
255 .. code-block:: c++
256
257 DIDescriptor FContext(Unit);
258 unsigned LineNo = 0;
259 unsigned ScopeLine = 0;
260 DISubprogram SP = DBuilder->createFunction(
261 FContext, Name, StringRef(), Unit, LineNo,
262 CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
263 true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F);
264
265 and we now have a DISubprogram that contains a reference to all of our metadata
266 for the function.
267
268 Source Locations
269 ================
270
271 The most important thing for debug information is accurate source location -
272 this makes it possible to map your source code back. We have a problem though,
273 Kaleidoscope really doesn't have any source location information in the lexer
274 or parser so we'll need to add it.
275
276 .. code-block:: c++
277
278 struct SourceLocation {
279 int Line;
280 int Col;
281 };
282 static SourceLocation CurLoc;
283 static SourceLocation LexLoc = {1, 0};
284
285 static int advance() {
286 int LastChar = getchar();
287
288 if (LastChar == '\n' || LastChar == '\r') {
289 LexLoc.Line++;
290 LexLoc.Col = 0;
291 } else
292 LexLoc.Col++;
293 return LastChar;
294 }
295
296 In this set of code we've added some functionality on how to keep track of the
297 line and column of the "source file". As we lex every token we set our current
298 current "lexical location" to the assorted line and column for the beginning
299 of the token. We do this by overriding all of the previous calls to
300 ``getchar()`` with our new ``advance()`` that keeps track of the information
301 and then we have added to all of our AST classes a source location:
302
303 .. code-block:: c++
304
305 class ExprAST {
306 SourceLocation Loc;
307
308 public:
309 int getLine() const { return Loc.Line; }
310 int getCol() const { return Loc.Col; }
311 ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
312 virtual std::ostream &dump(std::ostream &out, int ind) {
313 return out << ':' << getLine() << ':' << getCol() << '\n';
314 }
315
316 that we pass down through when we create a new expression:
317
318 .. code-block:: c++
319
320 LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
321
322 giving us locations for each of our expressions and variables.
323
324 From this we can make sure to tell ``DIBuilder`` when we're at a new source
325 location so it can use that when we generate the rest of our code and make
326 sure that each instruction has source location information. We do this
327 by constructing another small function:
328
329 .. code-block:: c++
330
331 void DebugInfo::emitLocation(ExprAST *AST) {
332 DIScope *Scope;
333 if (LexicalBlocks.empty())
334 Scope = &TheCU;
335 else
336 Scope = LexicalBlocks.back();
337 Builder.SetCurrentDebugLocation(
338 DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope)));
339 }
340
341 that both tells the main ``IRBuilder`` where we are, but also what scope
342 we're in. Since we've just created a function above we can either be in
343 the main file scope (like when we created our function), or now we can be
344 in the function scope we just created. To represent this we create a stack
345 of scopes:
346
347 .. code-block:: c++
348
349 std::vector LexicalBlocks;
350 std::map FnScopeMap;
351
352 and keep a map of each function to the scope that it represents (a DISubprogram
353 is also a DIScope).
354
355 Then we make sure to:
356
357 .. code-block:: c++
358
359 KSDbgInfo.emitLocation(this);
360
361 emit the location every time we start to generate code for a new AST, and
362 also:
363
364 .. code-block:: c++
365
366 KSDbgInfo.FnScopeMap[this] = SP;
367
368 store the scope (function) when we create it and use it:
369
370 KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
371
372 when we start generating the code for each function.
373
374 One interesting thing to note at this point is that various debuggers have
375 assumptions based on how code and debug information was generated for them
376 in the past. In this case we need to do a little bit of a hack to avoid
377 generating line information for the function prologue so that the debugger
378 knows to skip over those instructions when setting a breakpoint. So in
379 ``FunctionAST::CodeGen`` we add a couple of lines:
380
381 .. code-block:: c++
382
383 // Unset the location for the prologue emission (leading instructions with no
384 // location in a function are considered part of the prologue and the debugger
385 // will run past them when breaking on a function)
386 KSDbgInfo.emitLocation(nullptr);
387
388 and then emit a new location when we actually start generating code for the
389 body of the function:
390
391 .. code-block:: c++
392
393 KSDbgInfo.emitLocation(Body);
394
395 also, don't forget to pop the scope back off of your scope stack at the
396 end of the code generation for the function:
397
398 .. code-block:: c++
399
400 // Pop off the lexical block for the function since we added it
401 // unconditionally.
402 KSDbgInfo.LexicalBlocks.pop_back();
403
404
405 Full Code Listing
406 =================
407
408 Here is the complete code listing for our running example, enhanced with
409 debug information. To build this example, use:
410
411 .. code-block:: bash
412
413 # Compile
414 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy
415 # Run
416 ./toy
417
418 Here is the code:
419
420 .. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
421 :language: c++
422
423 `Next: Conclusion and other useful LLVM tidbits `_
424
0 ======================================================
1 Kaleidoscope: Conclusion and other useful LLVM tidbits
2 ======================================================
3
4 .. contents::
5 :local:
6
7 Tutorial Conclusion
8 ===================
9
10 Welcome to the final chapter of the "`Implementing a language with
11 LLVM `_" tutorial. In the course of this tutorial, we have
12 grown our little Kaleidoscope language from being a useless toy, to
13 being a semi-interesting (but probably still useless) toy. :)
14
15 It is interesting to see how far we've come, and how little code it has
16 taken. We built the entire lexer, parser, AST, code generator, and an
17 interactive run-loop (with a JIT!) by-hand in under 700 lines of
18 (non-comment/non-blank) code.
19
20 Our little language supports a couple of interesting features: it
21 supports user defined binary and unary operators, it uses JIT
22 compilation for immediate evaluation, and it supports a few control flow
23 constructs with SSA construction.
24
25 Part of the idea of this tutorial was to show you how easy and fun it
26 can be to define, build, and play with languages. Building a compiler
27 need not be a scary or mystical process! Now that you've seen some of
28 the basics, I strongly encourage you to take the code and hack on it.
29 For example, try adding:
30
31 - **global variables** - While global variables have questional value
32 in modern software engineering, they are often useful when putting
33 together quick little hacks like the Kaleidoscope compiler itself.
34 Fortunately, our current setup makes it very easy to add global
35 variables: just have value lookup check to see if an unresolved
36 variable is in the global variable symbol table before rejecting it.
37 To create a new global variable, make an instance of the LLVM
38 ``GlobalVariable`` class.
39 - **typed variables** - Kaleidoscope currently only supports variables
40 of type double. This gives the language a very nice elegance, because
41 only supporting one type means that you never have to specify types.
42 Different languages have different ways of handling this. The easiest
43 way is to require the user to specify types for every variable
44 definition, and record the type of the variable in the symbol table
45 along with its Value\*.
46 - **arrays, structs, vectors, etc** - Once you add types, you can start
47 extending the type system in all sorts of interesting ways. Simple
48 arrays are very easy and are quite useful for many different
49 applications. Adding them is mostly an exercise in learning how the
50 LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
51 works: it is so nifty/unconventional, it `has its own
52 FAQ <../GetElementPtr.html>`_! If you add support for recursive types
53 (e.g. linked lists), make sure to read the `section in the LLVM
54 Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
55 describes how to construct them.
56 - **standard runtime** - Our current language allows the user to access
57 arbitrary external functions, and we use it for things like "printd"
58 and "putchard". As you extend the language to add higher-level
59 constructs, often these constructs make the most sense if they are
60 lowered to calls into a language-supplied runtime. For example, if
61 you add hash tables to the language, it would probably make sense to
62 add the routines to a runtime, instead of inlining them all the way.
63 - **memory management** - Currently we can only access the stack in
64 Kaleidoscope. It would also be useful to be able to allocate heap
65 memory, either with calls to the standard libc malloc/free interface
66 or with a garbage collector. If you would like to use garbage
67 collection, note that LLVM fully supports `Accurate Garbage
68 Collection <../GarbageCollection.html>`_ including algorithms that
69 move objects and need to scan/update the stack.
70 - **debugger support** - LLVM supports generation of `DWARF Debug
71 info <../SourceLevelDebugging.html>`_ which is understood by common
72 debuggers like GDB. Adding support for debug info is fairly
73 straightforward. The best way to understand it is to compile some
74 C/C++ code with "``clang -g -O0``" and taking a look at what it
75 produces.
76 - **exception handling support** - LLVM supports generation of `zero
77 cost exceptions <../ExceptionHandling.html>`_ which interoperate with
78 code compiled in other languages. You could also generate code by
79 implicitly making every function return an error value and checking
80 it. You could also make explicit use of setjmp/longjmp. There are
81 many different ways to go here.
82 - **object orientation, generics, database access, complex numbers,
83 geometric programming, ...** - Really, there is no end of crazy
84 features that you can add to the language.
85 - **unusual domains** - We've been talking about applying LLVM to a
86 domain that many people are interested in: building a compiler for a
87 specific language. However, there are many other domains that can use
88 compiler technology that are not typically considered. For example,
89 LLVM has been used to implement OpenGL graphics acceleration,
90 translate C++ code to ActionScript, and many other cute and clever
91 things. Maybe you will be the first to JIT compile a regular
92 expression interpreter into native code with LLVM?
93
94 Have fun - try doing something crazy and unusual. Building a language
95 like everyone else always has, is much less fun than trying something a
96 little crazy or off the wall and seeing how it turns out. If you get
97 stuck or want to talk about it, feel free to email the `llvmdev mailing
98 list `_: it has lots
99 of people who are interested in languages and are often willing to help
100 out.
101
102 Before we end this tutorial, I want to talk about some "tips and tricks"
103 for generating LLVM IR. These are some of the more subtle things that
104 may not be obvious, but are very useful if you want to take advantage of
105 LLVM's capabilities.
106
107 Properties of the LLVM IR
108 =========================
109
110 We have a couple common questions about code in the LLVM IR form - lets
111 just get these out of the way right now, shall we?
112
113 Target Independence
114 -------------------
115
116 Kaleidoscope is an example of a "portable language": any program written
117 in Kaleidoscope will work the same way on any target that it runs on.
118 Many other languages have this property, e.g. lisp, java, haskell,
119 javascript, python, etc (note that while these languages are portable,
120 not all their libraries are).
121
122 One nice aspect of LLVM is that it is often capable of preserving target
123 independence in the IR: you can take the LLVM IR for a
124 Kaleidoscope-compiled program and run it on any target that LLVM
125 supports, even emitting C code and compiling that on targets that LLVM
126 doesn't support natively. You can trivially tell that the Kaleidoscope
127 compiler generates target-independent code because it never queries for
128 any target-specific information when generating code.
129
130 The fact that LLVM provides a compact, target-independent,
131 representation for code gets a lot of people excited. Unfortunately,
132 these people are usually thinking about C or a language from the C
133 family when they are asking questions about language portability. I say
134 "unfortunately", because there is really no way to make (fully general)
135 C code portable, other than shipping the source code around (and of
136 course, C source code is not actually portable in general either - ever
137 port a really old application from 32- to 64-bits?).
138
139 The problem with C (again, in its full generality) is that it is heavily
140 laden with target specific assumptions. As one simple example, the
141 preprocessor often destructively removes target-independence from the
142 code when it processes the input text:
143
144 .. code-block:: c
145
146 #ifdef __i386__
147 int X = 1;
148 #else
149 int X = 42;
150 #endif
151
152 While it is possible to engineer more and more complex solutions to
153 problems like this, it cannot be solved in full generality in a way that
154 is better than shipping the actual source code.
155
156 That said, there are interesting subsets of C that can be made portable.
157 If you are willing to fix primitive types to a fixed size (say int =
158 32-bits, and long = 64-bits), don't care about ABI compatibility with
159 existing binaries, and are willing to give up some other minor features,
160 you can have portable code. This can make sense for specialized domains
161 such as an in-kernel language.
162
163 Safety Guarantees
164 -----------------
165
166 Many of the languages above are also "safe" languages: it is impossible
167 for a program written in Java to corrupt its address space and crash the
168 process (assuming the JVM has no bugs). Safety is an interesting
169 property that requires a combination of language design, runtime
170 support, and often operating system support.
171
172 It is certainly possible to implement a safe language in LLVM, but LLVM
173 IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
174 casts, use after free bugs, buffer over-runs, and a variety of other
175 problems. Safety needs to be implemented as a layer on top of LLVM and,
176 conveniently, several groups have investigated this. Ask on the `llvmdev
177 mailing list `_ if
178 you are interested in more details.
179
180 Language-Specific Optimizations
181 -------------------------------
182
183 One thing about LLVM that turns off many people is that it does not
184 solve all the world's problems in one system (sorry 'world hunger',
185 someone else will have to solve you some other day). One specific
186 complaint is that people perceive LLVM as being incapable of performing
187 high-level language-specific optimization: LLVM "loses too much
188 information".
189
190 Unfortunately, this is really not the place to give you a full and
191 unified version of "Chris Lattner's theory of compiler design". Instead,
192 I'll make a few observations:
193
194 First, you're right that LLVM does lose information. For example, as of
195 this writing, there is no way to distinguish in the LLVM IR whether an
196 SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
197 than debug info). Both get compiled down to an 'i32' value and the
198 information about what it came from is lost. The more general issue
199 here, is that the LLVM type system uses "structural equivalence" instead
200 of "name equivalence". Another place this surprises people is if you
201 have two types in a high-level language that have the same structure
202 (e.g. two different structs that have a single int field): these types
203 will compile down into a single LLVM type and it will be impossible to
204 tell what it came from.
205
206 Second, while LLVM does lose information, LLVM is not a fixed target: we
207 continue to enhance and improve it in many different ways. In addition
208 to adding new features (LLVM did not always support exceptions or debug
209 info), we also extend the IR to capture important information for
210 optimization (e.g. whether an argument is sign or zero extended,
211 information about pointers aliasing, etc). Many of the enhancements are
212 user-driven: people want LLVM to include some specific feature, so they
213 go ahead and extend it.
214
215 Third, it is *possible and easy* to add language-specific optimizations,
216 and you have a number of choices in how to do it. As one trivial
217 example, it is easy to add language-specific optimization passes that
218 "know" things about code compiled for a language. In the case of the C
219 family, there is an optimization pass that "knows" about the standard C
220 library functions. If you call "exit(0)" in main(), it knows that it is
221 safe to optimize that into "return 0;" because C specifies what the
222 'exit' function does.
223
224 In addition to simple library knowledge, it is possible to embed a
225 variety of other language-specific information into the LLVM IR. If you
226 have a specific need and run into a wall, please bring the topic up on
227 the llvmdev list. At the very worst, you can always treat LLVM as if it
228 were a "dumb code generator" and implement the high-level optimizations
229 you desire in your front-end, on the language-specific AST.
230
231 Tips and Tricks
232 ===============
233
234 There is a variety of useful tips and tricks that you come to know after
235 working on/with LLVM that aren't obvious at first glance. Instead of
236 letting everyone rediscover them, this section talks about some of these
237 issues.
238
239 Implementing portable offsetof/sizeof
240 -------------------------------------
241
242 One interesting thing that comes up, if you are trying to keep the code
243 generated by your compiler "target independent", is that you often need
244 to know the size of some LLVM type or the offset of some field in an
245 llvm structure. For example, you might need to pass the size of a type
246 into a function that allocates memory.
247
248 Unfortunately, this can vary widely across targets: for example the
249 width of a pointer is trivially target-specific. However, there is a
250 `clever way to use the getelementptr
251 instruction `_
252 that allows you to compute this in a portable way.
253
254 Garbage Collected Stack Frames
255 ------------------------------
256
257 Some languages want to explicitly manage their stack frames, often so
258 that they are garbage collected or to allow easy implementation of
259 closures. There are often better ways to implement these features than
260 explicit stack frames, but `LLVM does support
261 them, `_
262 if you want. It requires your front-end to convert the code into
263 `Continuation Passing
264 Style `_ and
265 the use of tail calls (which LLVM also supports).
266
0 set(LLVM_LINK_COMPONENTS
1 Analysis
2 Core
3 ExecutionEngine
4 InstCombine
5 MC
6 ScalarOpts
7 Support
8 TransformUtils
9 nativecodegen
10 )
11
12 set(LLVM_REQUIRES_RTTI 1)
13
14 add_llvm_example(Kaleidoscope-Ch8
15 toy.cpp
16 )
0 ##===- examples/Kaleidoscope/Chapter7/Makefile -------------*- Makefile -*-===##
1 #
2 # The LLVM Compiler Infrastructure
3 #
4 # This file is distributed under the University of Illinois Open Source
5 # License. See LICENSE.TXT for details.
6 #
7 ##===----------------------------------------------------------------------===##
8 LEVEL = ../../..
9 TOOLNAME = Kaleidoscope-Ch8
10 EXAMPLE_TOOL = 1
11 REQUIRES_RTTI := 1
12
13 LINK_COMPONENTS := core mcjit native
14
15 include $(LEVEL)/Makefile.common
0 #include "llvm/ADT/Triple.h"
1 #include "llvm/Analysis/Passes.h"
2 #include "llvm/ExecutionEngine/ExecutionEngine.h"
3 #include "llvm/ExecutionEngine/MCJIT.h"
4 #include "llvm/ExecutionEngine/SectionMemoryManager.h"
5 #include "llvm/IR/DataLayout.h"
6 #include "llvm/IR/DerivedTypes.h"
7 #include "llvm/IR/DIBuilder.h"
8 #include "llvm/IR/IRBuilder.h"
9 #include "llvm/IR/LLVMContext.h"
10 #include "llvm/IR/Module.h"
11 #include "llvm/IR/Verifier.h"
12 #include "llvm/PassManager.h"
13 #include "llvm/Support/Host.h"
14 #include "llvm/Support/TargetSelect.h"
15 #include "llvm/Transforms/Scalar.h"
16 #include
17 #include
18 #include
19 #include
20 #include
21 #include
22 using namespace llvm;
23
24 //===----------------------------------------------------------------------===//
25 // Lexer
26 //===----------------------------------------------------------------------===//
27
28 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
29 // of these for known things.
30 enum Token {
31 tok_eof = -1,
32
33 // commands
34 tok_def = -2,
35 tok_extern = -3,
36
37 // primary
38 tok_identifier = -4,
39 tok_number = -5,
40
41 // control
42 tok_if = -6,
43 tok_then = -7,
44 tok_else = -8,
45 tok_for = -9,
46 tok_in = -10,
47
48 // operators
49 tok_binary = -11,
50 tok_unary = -12,
51
52 // var definition
53 tok_var = -13
54 };
55
56 std::string getTokName(int Tok) {
57 switch (Tok) {
58 case tok_eof:
59 return "eof";
60 case tok_def:
61 return "def";
62 case tok_extern:
63 return "extern";
64 case tok_identifier:
65 return "identifier";
66 case tok_number:
67 return "number";
68 case tok_if:
69 return "if";
70 case tok_then:
71 return "then";
72 case tok_else:
73 return "else";
74 case tok_for:
75 return "for";
76 case tok_in:
77 return "in";
78 case tok_binary:
79 return "binary";
80 case tok_unary:
81 return "unary";
82 case tok_var:
83 return "var";
84 }
85 return std::string(1, (char)Tok);
86 }
87
88 namespace {
89 class PrototypeAST;
90 class ExprAST;
91 }
92 static IRBuilder<> Builder(getGlobalContext());
93 struct DebugInfo {
94 DICompileUnit TheCU;
95 DIType DblTy;
96 std::vector LexicalBlocks;
97 std::map FnScopeMap;
98
99 void emitLocation(ExprAST *AST);
100 DIType getDoubleTy();
101 } KSDbgInfo;
102
103 static std::string IdentifierStr; // Filled in if tok_identifier
104 static double NumVal; // Filled in if tok_number
105 struct SourceLocation {
106 int Line;
107 int Col;
108 };
109 static SourceLocation CurLoc;
110 static SourceLocation LexLoc = { 1, 0 };
111
112 static int advance() {
113 int LastChar = getchar();
114
115 if (LastChar == '\n' || LastChar == '\r') {
116 LexLoc.Line++;
117 LexLoc.Col = 0;
118 } else
119 LexLoc.Col++;
120 return LastChar;
121 }
122
123 /// gettok - Return the next token from standard input.
124 static int gettok() {
125 static int LastChar = ' ';
126
127 // Skip any whitespace.
128 while (isspace(LastChar))
129 LastChar = advance();
130
131 CurLoc = LexLoc;
132
133 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
134 IdentifierStr = LastChar;
135 while (isalnum((LastChar = advance())))
136 IdentifierStr += LastChar;
137
138 if (IdentifierStr == "def")
139 return tok_def;
140 if (IdentifierStr == "extern")
141 return tok_extern;
142 if (IdentifierStr == "if")
143 return tok_if;
144 if (IdentifierStr == "then")
145 return tok_then;
146 if (IdentifierStr == "else")
147 return tok_else;
148 if (IdentifierStr == "for")
149 return tok_for;
150 if (IdentifierStr == "in")
151 return tok_in;
152 if (IdentifierStr == "binary")
153 return tok_binary;
154 if (IdentifierStr == "unary")
155 return tok_unary;
156 if (IdentifierStr == "var")
157 return tok_var;
158 return tok_identifier;
159 }
160
161 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
162 std::string NumStr;
163 do {
164 NumStr += LastChar;
165 LastChar = advance();
166 } while (isdigit(LastChar) || LastChar == '.');
167
168 NumVal = strtod(NumStr.c_str(), 0);
169 return tok_number;
170 }
171
172 if (LastChar == '#') {
173 // Comment until end of line.
174 do
175 LastChar = advance();
176 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
177
178 if (LastChar != EOF)
179 return gettok();
180 }
181
182 // Check for end of file. Don't eat the EOF.
183 if (LastChar == EOF)
184 return tok_eof;
185
186 // Otherwise, just return the character as its ascii value.
187 int ThisChar = LastChar;
188 LastChar = advance();
189 return ThisChar;
190 }
191
192 //===----------------------------------------------------------------------===//
193 // Abstract Syntax Tree (aka Parse Tree)
194 //===----------------------------------------------------------------------===//
195 namespace {
196
197 std::ostream &indent(std::ostream &O, int size) {
198 return O << std::string(size, ' ');
199 }
200
201 /// ExprAST - Base class for all expression nodes.
202 class ExprAST {
203 SourceLocation Loc;
204
205 public:
206 int getLine() const { return Loc.Line; }
207 int getCol() const { return Loc.Col; }
208 ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
209 virtual std::ostream &dump(std::ostream &out, int ind) {
210 return out << ':' << getLine() << ':' << getCol() << '\n';
211 }
212 virtual ~ExprAST() {}
213 virtual Value *Codegen() = 0;
214 };
215
216 /// NumberExprAST - Expression class for numeric literals like "1.0".
217 class NumberExprAST : public ExprAST {
218 double Val;
219
220 public:
221 NumberExprAST(double val) : Val(val) {}
222 virtual std::ostream &dump(std::ostream &out, int ind) {
223 return ExprAST::dump(out << Val, ind);
224 }
225 virtual Value *Codegen();
226 };
227
228 /// VariableExprAST - Expression class for referencing a variable, like "a".
229 class VariableExprAST : public ExprAST {
230 std::string Name;
231
232 public:
233 VariableExprAST(SourceLocation Loc, const std::string &name)
234 : ExprAST(Loc), Name(name) {}
235 const std::string &getName() const { return Name; }
236 virtual std::ostream &dump(std::ostream &out, int ind) {
237 return ExprAST::dump(out << Name, ind);
238 }
239 virtual Value *Codegen();
240 };
241
242 /// UnaryExprAST - Expression class for a unary operator.
243 class UnaryExprAST : public ExprAST {
244 char Opcode;
245 ExprAST *Operand;
246
247 public:
248 UnaryExprAST(char opcode, ExprAST *operand)
249 : Opcode(opcode), Operand(operand) {}
250 virtual std::ostream &dump(std::ostream &out, int ind) {
251 ExprAST::dump(out << "unary" << Opcode, ind);
252 Operand->dump(out, ind + 1);
253 return out;
254 }
255 virtual Value *Codegen();
256 };
257
258 /// BinaryExprAST - Expression class for a binary operator.
259 class BinaryExprAST : public ExprAST {
260 char Op;
261 ExprAST *LHS, *RHS;
262
263 public:
264 BinaryExprAST(SourceLocation Loc, char op, ExprAST *lhs, ExprAST *rhs)
265 : ExprAST(Loc), Op(op), LHS(lhs), RHS(rhs) {}
266 virtual std::ostream &dump(std::ostream &out, int ind) {
267 ExprAST::dump(out << "binary" << Op, ind);
268 LHS->dump(indent(out, ind) << "LHS:", ind + 1);
269 RHS->dump(indent(out, ind) << "RHS:", ind + 1);
270 return out;
271 }
272 virtual Value *Codegen();
273 };
274
275 /// CallExprAST - Expression class for function calls.
276 class CallExprAST : public ExprAST {
277 std::string Callee;
278 std::vector Args;
279
280 public:
281 CallExprAST(SourceLocation Loc, const std::string &callee,
282 std::vector &args)
283 : ExprAST(Loc), Callee(callee), Args(args) {}
284 virtual std::ostream &dump(std::ostream &out, int ind) {
285 ExprAST::dump(out << "call " << Callee, ind);
286 for (ExprAST *Arg : Args)
287 Arg->dump(indent(out, ind + 1), ind + 1);
288 return out;
289 }
290 virtual Value *Codegen();
291 };
292
293 /// IfExprAST - Expression class for if/then/else.
294 class IfExprAST : public ExprAST {
295 ExprAST *Cond, *Then, *Else;
296
297 public:
298 IfExprAST(SourceLocation Loc, ExprAST *cond, ExprAST *then, ExprAST *_else)
299 : ExprAST(Loc), Cond(cond), Then(then), Else(_else) {}
300 virtual std::ostream &dump(std::ostream &out, int ind) {
301 ExprAST::dump(out << "if", ind);
302 Cond->dump(indent(out, ind) << "Cond:", ind + 1);
303 Then->dump(indent(out, ind) << "Then:", ind + 1);
304 Else->dump(indent(out, ind) << "Else:", ind + 1);
305 return out;
306 }
307 virtual Value *Codegen();
308 };
309
310 /// ForExprAST - Expression class for for/in.
311 class ForExprAST : public ExprAST {
312 std::string VarName;
313 ExprAST *Start, *End, *Step, *Body;
314
315 public:
316 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
317 ExprAST *step, ExprAST *body)
318 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
319 virtual std::ostream &dump(std::ostream &out, int ind) {
320 ExprAST::dump(out << "for", ind);
321 Start->dump(indent(out, ind) << "Cond:", ind + 1);
322 End->dump(indent(out, ind) << "End:", ind + 1);
323 Step->dump(indent(out, ind) << "Step:", ind + 1);
324 Body->dump(indent(out, ind) << "Body:", ind + 1);
325 return out;
326 }
327 virtual Value *Codegen();
328 };
329
330 /// VarExprAST - Expression class for var/in
331 class VarExprAST : public ExprAST {
332 std::vector > VarNames;
333 ExprAST *Body;
334
335 public:
336 VarExprAST(const std::vector > &varnames,
337 ExprAST *body)
338 : VarNames(varnames), Body(body) {}
339
340 virtual std::ostream &dump(std::ostream &out, int ind) {
341 ExprAST::dump(out << "var", ind);
342 for (const auto &NamedVar : VarNames)
343 NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);
344 Body->dump(indent(out, ind) << "Body:", ind + 1);
345 return out;
346 }
347 virtual Value *Codegen();
348 };
349
350 /// PrototypeAST - This class represents the "prototype" for a function,
351 /// which captures its argument names as well as if it is an operator.
352 class PrototypeAST {
353 std::string Name;
354 std::vector Args;
355 bool isOperator;
356 unsigned Precedence; // Precedence if a binary op.
357 int Line;
358
359 public:
360 PrototypeAST(SourceLocation Loc, const std::string &name,
361 const std::vector &args, bool isoperator = false,
362 unsigned prec = 0)
363 : Name(name), Args(args), isOperator(isoperator), Precedence(prec),
364 Line(Loc.Line) {}
365
366 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
367 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
368
369 char getOperatorName() const {
370 assert(isUnaryOp() || isBinaryOp());
371 return Name[Name.size() - 1];
372 }
373
374 unsigned getBinaryPrecedence() const { return Precedence; }
375
376 Function *Codegen();
377
378 void CreateArgumentAllocas(Function *F);
379 const std::vector &getArgs() const { return Args; }
380 };
381
382 /// FunctionAST - This class represents a function definition itself.
383 class FunctionAST {
384 PrototypeAST *Proto;
385 ExprAST *Body;
386
387 public:
388 FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {}
389
390 std::ostream &dump(std::ostream &out, int ind) {
391 indent(out, ind) << "FunctionAST\n";
392 ++ind;
393 indent(out, ind) << "Body:";
394 return Body ? Body->dump(out, ind) : out << "null\n";
395 }
396
397 Function *Codegen();
398 };
399 } // end anonymous namespace
400
401 //===----------------------------------------------------------------------===//
402 // Parser
403 //===----------------------------------------------------------------------===//
404
405 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
406 /// token the parser is looking at. getNextToken reads another token from the
407 /// lexer and updates CurTok with its results.
408 static int CurTok;
409 static int getNextToken() { return CurTok = gettok(); }
410
411 /// BinopPrecedence - This holds the precedence for each binary operator that is
412 /// defined.
413 static std::map BinopPrecedence;
414
415 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
416 static int GetTokPrecedence() {
417 if (!isascii(CurTok))
418 return -1;
419
420 // Make sure it's a declared binop.
421 int TokPrec = BinopPrecedence[CurTok];
422 if (TokPrec <= 0)
423 return -1;
424 return TokPrec;
425 }
426
427 /// Error* - These are little helper functions for error handling.
428 ExprAST *Error(const char *Str) {
429 fprintf(stderr, "Error: %s\n", Str);
430 return 0;
431 }
432 PrototypeAST *ErrorP(const char *Str) {
433 Error(Str);
434 return 0;
435 }
436 FunctionAST *ErrorF(const char *Str) {
437 Error(Str);
438 return 0;
439 }
440
441 static ExprAST *ParseExpression();
442
443 /// identifierexpr
444 /// ::= identifier
445 /// ::= identifier '(' expression* ')'
446 static ExprAST *ParseIdentifierExpr() {
447 std::string IdName = IdentifierStr;
448
449 SourceLocation LitLoc = CurLoc;
450
451 getNextToken(); // eat identifier.
452
453 if (CurTok != '(') // Simple variable ref.
454 return new VariableExprAST(LitLoc, IdName);
455
456 // Call.
457 getNextToken(); // eat (
458 std::vector Args;
459 if (CurTok != ')') {
460 while (1) {
461 ExprAST *Arg = ParseExpression();
462 if (!Arg)
463 return 0;
464 Args.push_back(Arg);
465
466 if (CurTok == ')')
467 break;
468
469 if (CurTok != ',')
470 return Error("Expected ')' or ',' in argument list");
471 getNextToken();
472 }
473 }
474
475 // Eat the ')'.
476 getNextToken();
477
478 return new CallExprAST(LitLoc, IdName, Args);
479 }
480
481 /// numberexpr ::= number
482 static ExprAST *ParseNumberExpr() {
483 ExprAST *Result = new NumberExprAST(NumVal);
484 getNextToken(); // consume the number
485 return Result;
486 }
487
488 /// parenexpr ::= '(' expression ')'
489 static ExprAST *ParseParenExpr() {
490 getNextToken(); // eat (.
491 ExprAST *V = ParseExpression();
492 if (!V)
493 return 0;
494
495 if (CurTok != ')')
496 return Error("expected ')'");
497 getNextToken(); // eat ).
498 return V;
499 }
500
501 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
502 static ExprAST *ParseIfExpr() {
503 SourceLocation IfLoc = CurLoc;
504
505 getNextToken(); // eat the if.
506
507 // condition.
508 ExprAST *Cond = ParseExpression();
509 if (!Cond)
510 return 0;
511
512 if (CurTok != tok_then)
513 return Error("expected then");
514 getNextToken(); // eat the then
515
516 ExprAST *Then = ParseExpression();
517 if (Then == 0)
518 return 0;
519
520 if (CurTok != tok_else)
521 return Error("expected else");
522
523 getNextToken();
524
525 ExprAST *Else = ParseExpression();
526 if (!Else)
527 return 0;
528
529 return new IfExprAST(IfLoc, Cond, Then, Else);
530 }
531
532 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
533 static ExprAST *ParseForExpr() {
534 getNextToken(); // eat the for.
535
536 if (CurTok != tok_identifier)
537 return Error("expected identifier after for");
538
539 std::string IdName = IdentifierStr;
540 getNextToken(); // eat identifier.
541
542 if (CurTok != '=')
543 return Error("expected '=' after for");
544 getNextToken(); // eat '='.
545
546 ExprAST *Start = ParseExpression();
547 if (Start == 0)
548 return 0;
549 if (CurTok != ',')
550 return Error("expected ',' after for start value");
551 getNextToken();
552
553 ExprAST *End = ParseExpression();
554 if (End == 0)
555 return 0;
556
557 // The step value is optional.
558 ExprAST *Step = 0;
559 if (CurTok == ',') {
560 getNextToken();
561 Step = ParseExpression();
562 if (Step == 0)
563 return 0;
564 }
565
566 if (CurTok != tok_in)
567 return Error("expected 'in' after for");
568 getNextToken(); // eat 'in'.
569
570 ExprAST *Body = ParseExpression();
571 if (Body == 0)
572 return 0;
573
574 return new ForExprAST(IdName, Start, End, Step, Body);
575 }
576
577 /// varexpr ::= 'var' identifier ('=' expression)?
578 // (',' identifier ('=' expression)?)* 'in' expression
579 static ExprAST *ParseVarExpr() {
580 getNextToken(); // eat the var.
581
582 std::vector > VarNames;
583
584 // At least one variable name is required.
585 if (CurTok != tok_identifier)
586 return Error("expected identifier after var");
587
588 while (1) {
589 std::string Name = IdentifierStr;
590 getNextToken(); // eat identifier.
591
592 // Read the optional initializer.
593 ExprAST *Init = 0;
594 if (CurTok == '=') {
595 getNextToken(); // eat the '='.
596
597 Init = ParseExpression();
598 if (Init == 0)
599 return 0;
600 }
601
602 VarNames.push_back(std::make_pair(Name, Init));
603
604 // End of var list, exit loop.
605 if (CurTok != ',')
606 break;
607 getNextToken(); // eat the ','.
608
609 if (CurTok != tok_identifier)
610 return Error("expected identifier list after var");
611 }
612
613 // At this point, we have to have 'in'.
614 if (CurTok != tok_in)
615 return Error("expected 'in' keyword after 'var'");
616 getNextToken(); // eat 'in'.
617
618 ExprAST *Body = ParseExpression();
619 if (Body == 0)
620 return 0;
621
622 return new VarExprAST(VarNames, Body);
623 }
624
625 /// primary
626 /// ::= identifierexpr
627 /// ::= numberexpr
628 /// ::= parenexpr
629 /// ::= ifexpr
630 /// ::= forexpr
631 /// ::= varexpr
632 static ExprAST *ParsePrimary() {
633 switch (CurTok) {
634 default:
635 return Error("unknown token when expecting an expression");
636 case tok_identifier:
637 return ParseIdentifierExpr();
638 case tok_number:
639 return ParseNumberExpr();
640 case '(':
641 return ParseParenExpr();
642 case tok_if:
643 return ParseIfExpr();
644 case tok_for:
645 return ParseForExpr();
646 case tok_var:
647 return ParseVarExpr();
648 }
649 }
650
651 /// unary
652 /// ::= primary
653 /// ::= '!' unary
654 static ExprAST *ParseUnary() {
655 // If the current token is not an operator, it must be a primary expr.
656 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
657 return ParsePrimary();
658
659 // If this is a unary operator, read it.
660 int Opc = CurTok;
661 getNextToken();
662 if (ExprAST *Operand = ParseUnary())
663 return new UnaryExprAST(Opc, Operand);
664 return 0;
665 }
666
667 /// binoprhs
668 /// ::= ('+' unary)*
669 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
670 // If this is a binop, find its precedence.
671 while (1) {
672 int TokPrec = GetTokPrecedence();
673
674 // If this is a binop that binds at least as tightly as the current binop,
675 // consume it, otherwise we are done.
676 if (TokPrec < ExprPrec)
677 return LHS;
678
679 // Okay, we know this is a binop.
680 int BinOp = CurTok;
681 SourceLocation BinLoc = CurLoc;
682 getNextToken(); // eat binop
683
684 // Parse the unary expression after the binary operator.
685 ExprAST *RHS = ParseUnary();
686 if (!RHS)
687 return 0;
688
689 // If BinOp binds less tightly with RHS than the operator after RHS, let
690 // the pending operator take RHS as its LHS.
691 int NextPrec = GetTokPrecedence();
692 if (TokPrec < NextPrec) {
693 RHS = ParseBinOpRHS(TokPrec + 1, RHS);
694 if (RHS == 0)
695 return 0;
696 }
697
698 // Merge LHS/RHS.
699 LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
700 }
701 }
702
703 /// expression
704 /// ::= unary binoprhs
705 ///
706 static ExprAST *ParseExpression() {
707 ExprAST *LHS = ParseUnary();
708 if (!LHS)
709 return 0;
710
711 return ParseBinOpRHS(0, LHS);
712 }
713
714 /// prototype
715 /// ::= id '(' id* ')'
716 /// ::= binary LETTER number? (id, id)
717 /// ::= unary LETTER (id)
718 static PrototypeAST *ParsePrototype() {
719 std::string FnName;
720
721 SourceLocation FnLoc = CurLoc;
722
723 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
724 unsigned BinaryPrecedence = 30;
725
726 switch (CurTok) {
727 default:
728 return ErrorP("Expected function name in prototype");
729 case tok_identifier:
730 FnName = IdentifierStr;
731 Kind = 0;
732 getNextToken();
733 break;
734 case tok_unary:
735 getNextToken();
736 if (!isascii(CurTok))
737 return ErrorP("Expected unary operator");
738 FnName = "unary";
739 FnName += (char)CurTok;
740 Kind = 1;
741 getNextToken();
742 break;
743 case tok_binary:
744 getNextToken();
745 if (!isascii(CurTok))
746 return ErrorP("Expected binary operator");
747 FnName = "binary";
748 FnName += (char)CurTok;
749 Kind = 2;
750 getNextToken();
751
752 // Read the precedence if present.
753 if (CurTok == tok_number) {
754 if (NumVal < 1 || NumVal > 100)
755 return ErrorP("Invalid precedecnce: must be 1..100");
756 BinaryPrecedence = (unsigned)NumVal;
757 getNextToken();
758 }
759 break;
760 }
761
762 if (CurTok != '(')
763 return ErrorP("Expected '(' in prototype");
764
765 std::vector ArgNames;
766 while (getNextToken() == tok_identifier)
767 ArgNames.push_back(IdentifierStr);
768 if (CurTok != ')')
769 return ErrorP("Expected ')' in prototype");
770
771 // success.
772 getNextToken(); // eat ')'.
773
774 // Verify right number of names for operator.
775 if (Kind && ArgNames.size() != Kind)
776 return ErrorP("Invalid number of operands for operator");
777
778 return new PrototypeAST(FnLoc, FnName, ArgNames, Kind != 0, BinaryPrecedence);
779 }
780
781 /// definition ::= 'def' prototype expression
782 static FunctionAST *ParseDefinition() {
783 getNextToken(); // eat def.
784 PrototypeAST *Proto = ParsePrototype();
785 if (Proto == 0)
786 return 0;
787
788 if (ExprAST *E = ParseExpression())
789 return new FunctionAST(Proto, E);
790 return 0;
791 }
792
793 /// toplevelexpr ::= expression
794 static FunctionAST *ParseTopLevelExpr() {
795 SourceLocation FnLoc = CurLoc;
796 if (ExprAST *E = ParseExpression()) {
797 // Make an anonymous proto.
798 PrototypeAST *Proto =
799 new PrototypeAST(FnLoc, "main", std::vector());
800 return new FunctionAST(Proto, E);
801 }
802 return 0;
803 }
804
805 /// external ::= 'extern' prototype
806 static PrototypeAST *ParseExtern() {
807 getNextToken(); // eat extern.
808 return ParsePrototype();
809 }
810
811 //===----------------------------------------------------------------------===//
812 // Debug Info Support
813 //===----------------------------------------------------------------------===//
814
815 static DIBuilder *DBuilder;
816
817 DIType DebugInfo::getDoubleTy() {
818 if (DblTy.isValid())
819 return DblTy;
820
821 DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
822 return DblTy;
823 }
824
825 void DebugInfo::emitLocation(ExprAST *AST) {
826 if (!AST)
827 return Builder.SetCurrentDebugLocation(DebugLoc());
828 DIScope *Scope;
829 if (LexicalBlocks.empty())
830 Scope = &TheCU;
831 else
832 Scope = LexicalBlocks.back();
833 Builder.SetCurrentDebugLocation(
834 DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope)));
835 }
836
837 static DICompositeType CreateFunctionType(unsigned NumArgs, DIFile Unit) {
838 SmallVector EltTys;
839 DIType DblTy = KSDbgInfo.getDoubleTy();
840
841 // Add the result type.
842 EltTys.push_back(DblTy);
843
844 for (unsigned i = 0, e = NumArgs; i != e; ++i)
845 EltTys.push_back(DblTy);
846
847 DITypeArray EltTypeArray = DBuilder->getOrCreateTypeArray(EltTys);
848 return DBuilder->createSubroutineType(Unit, EltTypeArray);
849 }
850
851 //===----------------------------------------------------------------------===//
852 // Code Generation
853 //===----------------------------------------------------------------------===//
854
855 static Module *TheModule;
856 static std::map NamedValues;
857 static FunctionPassManager *TheFPM;
858
859 Value *ErrorV(const char *Str) {
860 Error(Str);
861 return 0;
862 }
863
864 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
865 /// the function. This is used for mutable variables etc.
866 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
867 const std::string &VarName) {
868 IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
869 TheFunction->getEntryBlock().begin());
870 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
871 VarName.c_str());
872 }
873
874 Value *NumberExprAST::Codegen() {
875 KSDbgInfo.emitLocation(this);
876 return ConstantFP::get(getGlobalContext(), APFloat(Val));
877 }
878
879 Value *VariableExprAST::Codegen() {
880 // Look this variable up in the function.
881 Value *V = NamedValues[Name];
882 if (V == 0)
883 return ErrorV("Unknown variable name");
884
885 KSDbgInfo.emitLocation(this);
886 // Load the value.
887 return Builder.CreateLoad(V, Name.c_str());
888 }
889
890 Value *UnaryExprAST::Codegen() {
891 Value *OperandV = Operand->Codegen();
892 if (OperandV == 0)
893 return 0;
894
895 Function *F = TheModule->getFunction(std::string("unary") + Opcode);
896 if (F == 0)
897 return ErrorV("Unknown unary operator");
898
899 KSDbgInfo.emitLocation(this);
900 return Builder.CreateCall(F, OperandV, "unop");
901 }
902
903 Value *BinaryExprAST::Codegen() {
904 KSDbgInfo.emitLocation(this);
905
906 // Special case '=' because we don't want to emit the LHS as an expression.
907 if (Op == '=') {
908 // Assignment requires the LHS to be an identifier.
909 VariableExprAST *LHSE = dynamic_cast(LHS);
910 if (!LHSE)
911 return ErrorV("destination of '=' must be a variable");
912 // Codegen the RHS.
913 Value *Val = RHS->Codegen();
914 if (Val == 0)
915 return 0;
916
917 // Look up the name.
918 Value *Variable = NamedValues[LHSE->getName()];
919 if (Variable == 0)
920 return ErrorV("Unknown variable name");
921
922 Builder.CreateStore(Val, Variable);
923 return Val;
924 }
925
926 Value *L = LHS->Codegen();
927 Value *R = RHS->Codegen();
928 if (L == 0 || R == 0)
929 return 0;
930
931 switch (Op) {
932 case '+':
933 return Builder.CreateFAdd(L, R, "addtmp");
934 case '-':
935 return Builder.CreateFSub(L, R, "subtmp");
936 case '*':
937 return Builder.CreateFMul(L, R, "multmp");
938 case '<':
939 L = Builder.CreateFCmpULT(L, R, "cmptmp");
940 // Convert bool 0/1 to double 0.0 or 1.0
941 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
942 "booltmp");
943 default:
944 break;
945 }
946
947 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
948 // a call to it.
949 Function *F = TheModule->getFunction(std::string("binary") + Op);
950 assert(F && "binary operator not found!");
951
952 Value *Ops[] = { L, R };
953 return Builder.CreateCall(F, Ops, "binop");
954 }
955
956 Value *CallExprAST::Codegen() {
957 KSDbgInfo.emitLocation(this);
958
959 // Look up the name in the global module table.
960 Function *CalleeF = TheModule->getFunction(Callee);
961 if (CalleeF == 0)
962 return ErrorV("Unknown function referenced");
963
964 // If argument mismatch error.
965 if (CalleeF->arg_size() != Args.size())
966 return ErrorV("Incorrect # arguments passed");
967
968 std::vector ArgsV;
969 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
970 ArgsV.push_back(Args[i]->Codegen());
971 if (ArgsV.back() == 0)
972 return 0;
973 }
974
975 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
976 }
977
978 Value *IfExprAST::Codegen() {
979 KSDbgInfo.emitLocation(this);
980
981 Value *CondV = Cond->Codegen();
982 if (CondV == 0)
983 return 0;
984
985 // Convert condition to a bool by comparing equal to 0.0.
986 CondV = Builder.CreateFCmpONE(
987 CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond");
988
989 Function *TheFunction = Builder.GetInsertBlock()->getParent();
990
991 // Create blocks for the then and else cases. Insert the 'then' block at the
992 // end of the function.
993 BasicBlock *ThenBB =
994 BasicBlock::Create(getGlobalContext(), "then", TheFunction);
995 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
996 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
997
998 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
999
1000 // Emit then value.
1001 Builder.SetInsertPoint(ThenBB);
1002
1003 Value *ThenV = Then->Codegen();
1004 if (ThenV == 0)
1005 return 0;
1006
1007 Builder.CreateBr(MergeBB);
1008 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1009 ThenBB = Builder.GetInsertBlock();
1010
1011 // Emit else block.
1012 TheFunction->getBasicBlockList().push_back(ElseBB);
1013 Builder.SetInsertPoint(ElseBB);
1014
1015 Value *ElseV = Else->Codegen();
1016 if (ElseV == 0)
1017 return 0;
1018
1019 Builder.CreateBr(MergeBB);
1020 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1021 ElseBB = Builder.GetInsertBlock();
1022
1023 // Emit merge block.
1024 TheFunction->getBasicBlockList().push_back(MergeBB);
1025 Builder.SetInsertPoint(MergeBB);
1026 PHINode *PN =
1027 Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp");
1028
1029 PN->addIncoming(ThenV, ThenBB);
1030 PN->addIncoming(ElseV, ElseBB);
1031 return PN;
1032 }
1033
1034 Value *ForExprAST::Codegen() {
1035 // Output this as:
1036 // var = alloca double
1037 // ...
1038 // start = startexpr
1039 // store start -> var
1040 // goto loop
1041 // loop:
1042 // ...
1043 // bodyexpr
1044 // ...
1045 // loopend:
1046 // step = stepexpr
1047 // endcond = endexpr
1048 //
1049 // curvar = load var
1050 // nextvar = curvar + step
1051 // store nextvar -> var
1052 // br endcond, loop, endloop
1053 // outloop:
1054
1055 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1056
1057 // Create an alloca for the variable in the entry block.
1058 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1059
1060 KSDbgInfo.emitLocation(this);
1061
1062 // Emit the start code first, without 'variable' in scope.
1063 Value *StartVal = Start->Codegen();
1064 if (StartVal == 0)
1065 return 0;
1066
1067 // Store the value into the alloca.
1068 Builder.CreateStore(StartVal, Alloca);
1069
1070 // Make the new basic block for the loop header, inserting after current
1071 // block.
1072 BasicBlock *LoopBB =
1073 BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1074
1075 // Insert an explicit fall through from the current block to the LoopBB.
1076 Builder.CreateBr(LoopBB);
1077
1078 // Start insertion in LoopBB.
1079 Builder.SetInsertPoint(LoopBB);
1080
1081 // Within the loop, the variable is defined equal to the PHI node. If it
1082 // shadows an existing variable, we have to restore it, so save it now.
1083 AllocaInst *OldVal = NamedValues[VarName];
1084 NamedValues[VarName] = Alloca;
1085
1086 // Emit the body of the loop. This, like any other expr, can change the
1087 // current BB. Note that we ignore the value computed by the body, but don't
1088 // allow an error.
1089 if (Body->Codegen() == 0)
1090 return 0;
1091
1092 // Emit the step value.
1093 Value *StepVal;
1094 if (Step) {
1095 StepVal = Step->Codegen();
1096 if (StepVal == 0)
1097 return 0;
1098 } else {
1099 // If not specified, use 1.0.
1100 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1101 }
1102
1103 // Compute the end condition.
1104 Value *EndCond = End->Codegen();
1105 if (EndCond == 0)
1106 return EndCond;
1107
1108 // Reload, increment, and restore the alloca. This handles the case where
1109 // the body of the loop mutates the variable.
1110 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1111 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
1112 Builder.CreateStore(NextVar, Alloca);
1113
1114 // Convert condition to a bool by comparing equal to 0.0.
1115 EndCond = Builder.CreateFCmpONE(
1116 EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond");
1117
1118 // Create the "after loop" block and insert it.
1119 BasicBlock *AfterBB =
1120 BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1121
1122 // Insert the conditional branch into the end of LoopEndBB.
1123 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1124
1125 // Any new code will be inserted in AfterBB.
1126 Builder.SetInsertPoint(AfterBB);
1127
1128 // Restore the unshadowed variable.
1129 if (OldVal)
1130 NamedValues[VarName] = OldVal;
1131 else
1132 NamedValues.erase(VarName);
1133
1134 // for expr always returns 0.0.
1135 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1136 }
1137
1138 Value *VarExprAST::Codegen() {
1139 std::vector OldBindings;
1140
1141 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1142
1143 // Register all variables and emit their initializer.
1144 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1145 const std::string &VarName = VarNames[i].first;
1146 ExprAST *Init = VarNames[i].second;
1147
1148 // Emit the initializer before adding the variable to scope, this prevents
1149 // the initializer from referencing the variable itself, and permits stuff
1150 // like this:
1151 // var a = 1 in
1152 // var a = a in ... # refers to outer 'a'.
1153 Value *InitVal;
1154 if (Init) {
1155 InitVal = Init->Codegen();
1156 if (InitVal == 0)
1157 return 0;
1158 } else { // If not specified, use 0.0.
1159 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
1160 }
1161
1162 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1163 Builder.CreateStore(InitVal, Alloca);
1164
1165 // Remember the old variable binding so that we can restore the binding when
1166 // we unrecurse.
1167 OldBindings.push_back(NamedValues[VarName]);
1168
1169 // Remember this binding.
1170 NamedValues[VarName] = Alloca;
1171 }
1172
1173 KSDbgInfo.emitLocation(this);
1174
1175 // Codegen the body, now that all vars are in scope.
1176 Value *BodyVal = Body->Codegen();
1177 if (BodyVal == 0)
1178 return 0;
1179
1180 // Pop all our variables from scope.
1181 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1182 NamedValues[VarNames[i].first] = OldBindings[i];
1183
1184 // Return the body computation.
1185 return BodyVal;
1186 }
1187
1188 Function *PrototypeAST::Codegen() {
1189 // Make the function type: double(double,double) etc.
1190 std::vector Doubles(Args.size(),
1191 Type::getDoubleTy(getGlobalContext()));
1192 FunctionType *FT =
1193 FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
1194
1195 Function *F =
1196 Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1197
1198 // If F conflicted, there was already something named 'Name'. If it has a
1199 // body, don't allow redefinition or reextern.
1200 if (F->getName() != Name) {
1201 // Delete the one we just made and get the existing one.
1202 F->eraseFromParent();
1203 F = TheModule->getFunction(Name);
1204
1205 // If F already has a body, reject this.
1206 if (!F->empty()) {
1207 ErrorF("redefinition of function");
1208 return 0;
1209 }
1210
1211 // If F took a different number of args, reject.
1212 if (F->arg_size() != Args.size()) {
1213 ErrorF("redefinition of function with different # args");
1214 return 0;
1215 }
1216 }
1217
1218 // Set names for all arguments.
1219 unsigned Idx = 0;
1220 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1221 ++AI, ++Idx)
1222 AI->setName(Args[Idx]);
1223
1224 // Create a subprogram DIE for this function.
1225 DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
1226 KSDbgInfo.TheCU.getDirectory());
1227 DIDescriptor FContext(Unit);
1228 unsigned LineNo = Line;
1229 unsigned ScopeLine = Line;
1230 DISubprogram SP = DBuilder->createFunction(
1231 FContext, Name, StringRef(), Unit, LineNo,
1232 CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
1233 true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F);
1234
1235 KSDbgInfo.FnScopeMap[this] = SP;
1236 return F;
1237 }
1238
1239 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1240 /// argument in the symbol table so that references to it will succeed.
1241 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1242 Function::arg_iterator AI = F->arg_begin();
1243 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1244 // Create an alloca for this variable.
1245 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1246
1247 // Create a debug descriptor for the variable.
1248 DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
1249 DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
1250 KSDbgInfo.TheCU.getDirectory());
1251 DIVariable D = DBuilder->createLocalVariable(dwarf::DW_TAG_arg_variable,
1252 *Scope, Args[Idx], Unit, Line,
1253 KSDbgInfo.getDoubleTy(), Idx);
1254
1255 Instruction *Call = DBuilder->insertDeclare(
1256 Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
1257 Call->setDebugLoc(DebugLoc::get(Line, 0, *Scope));
1258
1259 // Store the initial value into the alloca.
1260 Builder.CreateStore(AI, Alloca);
1261
1262 // Add arguments to variable symbol table.
1263 NamedValues[Args[Idx]] = Alloca;
1264 }
1265 }
1266
1267 Function *FunctionAST::Codegen() {
1268 NamedValues.clear();
1269
1270 Function *TheFunction = Proto->Codegen();
1271 if (TheFunction == 0)
1272 return 0;
1273
1274 // Push the current scope.
1275 KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
1276
1277 // Unset the location for the prologue emission (leading instructions with no
1278 // location in a function are considered part of the prologue and the debugger
1279 // will run past them when breaking on a function)
1280 KSDbgInfo.emitLocation(nullptr);
1281
1282 // If this is an operator, install it.
1283 if (Proto->isBinaryOp())
1284 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1285
1286 // Create a new basic block to start insertion into.
1287 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1288 Builder.SetInsertPoint(BB);
1289
1290 // Add all arguments to the symbol table and create their allocas.
1291 Proto->CreateArgumentAllocas(TheFunction);
1292
1293 KSDbgInfo.emitLocation(Body);
1294
1295 if (Value *RetVal = Body->Codegen()) {
1296 // Finish off the function.
1297 Builder.CreateRet(RetVal);
1298
1299 // Pop off the lexical block for the function.
1300 KSDbgInfo.LexicalBlocks.pop_back();
1301
1302 // Validate the generated code, checking for consistency.
1303 verifyFunction(*TheFunction);
1304
1305 // Optimize the function.
1306 TheFPM->run(*TheFunction);
1307
1308 return TheFunction;
1309 }
1310
1311 // Error reading body, remove function.
1312 TheFunction->eraseFromParent();
1313
1314 if (Proto->isBinaryOp())
1315 BinopPrecedence.erase(Proto->getOperatorName());
1316
1317 // Pop off the lexical block for the function since we added it
1318 // unconditionally.
1319 KSDbgInfo.LexicalBlocks.pop_back();
1320
1321 return 0;
1322 }
1323
1324 //===----------------------------------------------------------------------===//
1325 // Top-Level parsing and JIT Driver
1326 //===----------------------------------------------------------------------===//
1327
1328 static ExecutionEngine *TheExecutionEngine;
1329
1330 static void HandleDefinition() {
1331 if (FunctionAST *F = ParseDefinition()) {
1332 if (!F->Codegen()) {
1333 fprintf(stderr, "Error reading function definition:");
1334 }
1335 } else {
1336 // Skip token for error recovery.
1337 getNextToken();
1338 }
1339 }
1340
1341 static void HandleExtern() {
1342 if (PrototypeAST *P = ParseExtern()) {
1343 if (!P->Codegen()) {
1344 fprintf(stderr, "Error reading extern");
1345 }
1346 } else {
1347 // Skip token for error recovery.
1348 getNextToken();
1349 }
1350 }
1351
1352 static void HandleTopLevelExpression() {
1353 // Evaluate a top-level expression into an anonymous function.
1354 if (FunctionAST *F = ParseTopLevelExpr()) {
1355 if (!F->Codegen()) {
1356 fprintf(stderr, "Error generating code for top level expr");
1357 }
1358 } else {
1359 // Skip token for error recovery.
1360 getNextToken();
1361 }
1362 }
1363
1364 /// top ::= definition | external | expression | ';'
1365 static void MainLoop() {
1366 while (1) {
1367 switch (CurTok) {
1368 case tok_eof:
1369 return;
1370 case ';':
1371 getNextToken();
1372 break; // ignore top-level semicolons.
1373 case tok_def:
1374 HandleDefinition();
1375 break;
1376 case tok_extern:
1377 HandleExtern();
1378 break;
1379 default:
1380 HandleTopLevelExpression();
1381 break;
1382 }
1383 }
1384 }
1385
1386 //===----------------------------------------------------------------------===//
1387 // "Library" functions that can be "extern'd" from user code.
1388 //===----------------------------------------------------------------------===//
1389
1390 /// putchard - putchar that takes a double and returns 0.
1391 extern "C" double putchard(double X) {
1392 putchar((char)X);
1393 return 0;
1394 }
1395
1396 /// printd - printf that takes a double prints it as "%f\n", returning 0.
1397 extern "C" double printd(double X) {
1398 printf("%f\n", X);
1399 return 0;
1400 }
1401
1402 //===----------------------------------------------------------------------===//
1403 // Main driver code.
1404 //===----------------------------------------------------------------------===//
1405
1406 int main() {
1407 InitializeNativeTarget();
1408 InitializeNativeTargetAsmPrinter();
1409 InitializeNativeTargetAsmParser();
1410 LLVMContext &Context = getGlobalContext();
1411
1412 // Install standard binary operators.
1413 // 1 is lowest precedence.
1414 BinopPrecedence['='] = 2;
1415 BinopPrecedence['<'] = 10;
1416 BinopPrecedence['+'] = 20;
1417 BinopPrecedence['-'] = 20;
1418 BinopPrecedence['*'] = 40; // highest.
1419
1420 // Prime the first token.
1421 getNextToken();
1422
1423 // Make the module, which holds all the code.
1424 std::unique_ptr Owner = make_unique("my cool jit", Context);
1425 TheModule = Owner.get();
1426
1427 // Add the current debug info version into the module.
1428 TheModule->addModuleFlag(Module::Warning, "Debug Info Version",
1429 DEBUG_METADATA_VERSION);
1430
1431 // Darwin only supports dwarf2.
1432 if (Triple(sys::getProcessTriple()).isOSDarwin())
1433 TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);
1434
1435 // Construct the DIBuilder, we do this here because we need the module.
1436 DBuilder = new DIBuilder(*TheModule);
1437
1438 // Create the compile unit for the module.
1439 // Currently down as "fib.ks" as a filename since we're redirecting stdin
1440 // but we'd like actual source locations.
1441 KSDbgInfo.TheCU = DBuilder->createCompileUnit(
1442 dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
1443
1444 // Create the JIT. This takes ownership of the module.
1445 std::string ErrStr;
1446 TheExecutionEngine = EngineBuilder(std::move(Owner))
1447 .setErrorStr(&ErrStr)
1448 .setMCJITMemoryManager(new SectionMemoryManager())
1449 .create();
1450 if (!TheExecutionEngine) {
1451 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
1452 exit(1);
1453 }
1454
1455 FunctionPassManager OurFPM(TheModule);
1456
1457 // Set up the optimizer pipeline. Start with registering info about how the
1458 // target lays out data structures.
1459 TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
1460 OurFPM.add(new DataLayoutPass());
1461 #if 0
1462 // Provide basic AliasAnalysis support for GVN.
1463 OurFPM.add(createBasicAliasAnalysisPass());
1464 // Promote allocas to registers.
1465 OurFPM.add(createPromoteMemoryToRegisterPass());
1466 // Do simple "peephole" optimizations and bit-twiddling optzns.
1467 OurFPM.add(createInstructionCombiningPass());
1468 // Reassociate expressions.
1469 OurFPM.add(createReassociatePass());
1470 // Eliminate Common SubExpressions.
1471 OurFPM.add(createGVNPass());
1472 // Simplify the control flow graph (deleting unreachable blocks, etc).
1473 OurFPM.add(createCFGSimplificationPass());
1474 #endif
1475 OurFPM.doInitialization();
1476
1477 // Set the global so the code gen can use this.
1478 TheFPM = &OurFPM;
1479
1480 // Run the main "interpreter loop" now.
1481 MainLoop();
1482
1483 TheFPM = 0;
1484
1485 // Finalize the debug info.
1486 DBuilder->finalize();
1487
1488 // Print out all of the generated code.
1489 TheModule->dump();
1490
1491 return 0;
1492 }
99
1010 include $(LEVEL)/Makefile.config
1111
12 PARALLEL_DIRS:= Chapter2 Chapter3 Chapter4 Chapter5 Chapter6 Chapter7
12 PARALLEL_DIRS:= Chapter2 Chapter3 Chapter4 Chapter5 Chapter6 Chapter7 Chapter8
1313
1414 include $(LEVEL)/Makefile.common