llvm.org GIT mirror llvm / 131d7fb
[PerformanceTips] Document various items folks have suggested This could stand to be expanded - patches welcome! - but let's at least write them down so they don't get forgotten. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230995 91177308-0d34-0410-b5e6-96231b3b80d8 Philip Reames 4 years ago
1 changed file(s) with 45 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
4646 the range of the index, you may wish to manually extend indices to machine
4747 register width using a zext instruction.
4848
49 Other things to consider
50 =========================
51
52 #. Make sure that a DataLayout is provided (this will likely become required in
53 the near future, but is certainly important for optimization).
54
55 #. Add nsw/nuw/fast-math flags as appropriate
56
57 #. Add noalias/align/dereferenceable/nonnull to function arguments and return
58 values as appropriate
59
60 #. Mark functions as readnone/readonly/nounwind when known (especially for
61 external functions)
62
63 #. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
64 analysis), prefer GEPs
65
66 #. Use the lifetime.start/lifetime.end and invariant.start/invariant.end
67 intrinsics where possible. Common profitable uses are for stack like data
68 structures (thus allowing dead store elimination) and for describing
69 life times of allocas (thus allowing smaller stack sizes).
70
71 #. Use pointer aliasing metadata, especially tbaa metadata, to communicate
72 otherwise-non-deducible pointer aliasing facts
73
74 #. Use the "most-private" possible linkage types for the functions being defined
75 (private, internal or linkonce_odr preferably)
76
77 #. Mark invariant locations using !invariant.load and TBAA's constant flags
78
79 #. Prefer globals over inttoptr of a constant address - this gives you
80 dereferencability information. In MCJIT, use getSymbolAddress to provide
81 actual address.
82
83 #. Be wary of ordered and atomic memory operations. They are hard to optimize
84 and may not be well optimized by the current optimizer. Depending on your
85 source language, you may consider using fences instead.
86
87 #. If you language uses range checks, consider using the IRCE pass. It is not
88 currently part of the standard pass order.
89
90 p.s. If you want to help improve this document, patches expanding any of the
91 above items into standalone sections of their own with a more complete
92 discussion would be very welcome.
93
4994
5095 Adding to this document
5196 =======================