llvm.org GIT mirror llvm / 531ebc8
Add start of user documentation for NVPTX Summary: This is the beginning of user documentation for the NVPTX back-end. I want to ensure I am integrating this properly into the rest of the LLVM documentation. Differential Revision: http://llvm-reviews.chandlerc.com/D600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178428 91177308-0d34-0410-b5e6-96231b3b80d8 Justin Holewinski 6 years ago
3 changed file(s) with 287 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
106106 * `Mach-O Runtime Architecture `_
107107 * `Notes on Mach-O ABI `_
108108
109 NVPTX
110 =====
111
112 * `CUDA Documentation `_ includes the PTX
113 ISA and Driver API documentation
114
109115 Miscellaneous Resources
110116 =======================
111117
0 =============================
1 User Guide for NVPTX Back-end
2 =============================
3
4 .. contents::
5 :local:
6 :depth: 3
7
8
9 Introduction
10 ============
11
12 To support GPU programming, the NVPTX back-end supports a subset of LLVM IR
13 along with a defined set of conventions used to represent GPU programming
14 concepts. This document provides an overview of the general usage of the back-
15 end, including a description of the conventions used and the set of accepted
16 LLVM IR.
17
18 .. note::
19
20 This document assumes a basic familiarity with CUDA and the PTX
21 assembly language. Information about the CUDA Driver API and the PTX assembly
22 language can be found in the `CUDA documentation
23 `_.
24
25
26
27 Conventions
28 ===========
29
30 Marking Functions as Kernels
31 ----------------------------
32
33 In PTX, there are two types of functions: *device functions*, which are only
34 callable by device code, and *kernel functions*, which are callable by host
35 code. By default, the back-end will emit device functions. Metadata is used to
36 declare a function as a kernel function. This metadata is attached to the
37 ``nvvm.annotations`` named metadata object, and has the following format:
38
39 .. code-block:: llvm
40
41 !0 = metadata !{, metadata !"kernel", i32 1}
42
43 The first parameter is a reference to the kernel function. The following
44 example shows a kernel function calling a device function in LLVM IR. The
45 function ``@my_kernel`` is callable from host code, but ``@my_fmad`` is not.
46
47 .. code-block:: llvm
48
49 define float @my_fmad(float %x, float %y, float %z) {
50 %mul = fmul float %x, %y
51 %add = fadd float %mul, %z
52 ret float %add
53 }
54
55 define void @my_kernel(float* %ptr) {
56 %val = load float* %ptr
57 %ret = call float @my_fmad(float %val, float %val, float %val)
58 store float %ret, float* %ptr
59 ret void
60 }
61
62 !nvvm.annotations = !{!1}
63 !1 = metadata !{void (float*)* @my_kernel, metadata !"kernel", i32 1}
64
65 When compiled, the PTX kernel functions are callable by host-side code.
66
67
68 Address Spaces
69 --------------
70
71 The NVPTX back-end uses the following address space mapping:
72
73 ============= ======================
74 Address Space Memory Space
75 ============= ======================
76 0 Generic
77 1 Global
78 2 Internal Use
79 3 Shared
80 4 Constant
81 5 Local
82 ============= ======================
83
84 Every global variable and pointer type is assigned to one of these address
85 spaces, with 0 being the default address space. Intrinsics are provided which
86 can be used to convert pointers between the generic and non-generic address
87 spaces.
88
89 As an example, the following IR will define an array ``@g`` that resides in
90 global device memory.
91
92 .. code-block:: llvm
93
94 @g = internal addrspace(1) global [4 x i32] [ i32 0, i32 1, i32 2, i32 3 ]
95
96 LLVM IR functions can read and write to this array, and host-side code can
97 copy data to it by name with the CUDA Driver API.
98
99 Note that since address space 0 is the generic space, it is illegal to have
100 global variables in address space 0. Address space 0 is the default address
101 space in LLVM, so the ``addrspace(N)`` annotation is *required* for global
102 variables.
103
104
105 NVPTX Intrinsics
106 ================
107
108 Address Space Conversion
109 ------------------------
110
111 '``llvm.nvvm.ptr.*.to.gen``' Intrinsics
112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113
114 Syntax:
115 """""""
116
117 These are overloaded intrinsics. You can use these on any pointer types.
118
119 .. code-block:: llvm
120
121 declare i8* @llvm.nvvm.ptr.global.to.gen.p0i8.p1i8(i8 addrspace(1)*)
122 declare i8* @llvm.nvvm.ptr.shared.to.gen.p0i8.p3i8(i8 addrspace(3)*)
123 declare i8* @llvm.nvvm.ptr.constant.to.gen.p0i8.p4i8(i8 addrspace(4)*)
124 declare i8* @llvm.nvvm.ptr.local.to.gen.p0i8.p5i8(i8 addrspace(5)*)
125
126 Overview:
127 """""""""
128
129 The '``llvm.nvvm.ptr.*.to.gen``' intrinsics convert a pointer in a non-generic
130 address space to a generic address space pointer.
131
132 Semantics:
133 """"""""""
134
135 These intrinsics modify the pointer value to be a valid generic address space
136 pointer.
137
138
139 '``llvm.nvvm.ptr.gen.to.*``' Intrinsics
140 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
141
142 Syntax:
143 """""""
144
145 These are overloaded intrinsics. You can use these on any pointer types.
146
147 .. code-block:: llvm
148
149 declare i8* @llvm.nvvm.ptr.gen.to.global.p1i8.p0i8(i8 addrspace(1)*)
150 declare i8* @llvm.nvvm.ptr.gen.to.shared.p3i8.p0i8(i8 addrspace(3)*)
151 declare i8* @llvm.nvvm.ptr.gen.to.constant.p4i8.p0i8(i8 addrspace(4)*)
152 declare i8* @llvm.nvvm.ptr.gen.to.local.p5i8.p0i8(i8 addrspace(5)*)
153
154 Overview:
155 """""""""
156
157 The '``llvm.nvvm.ptr.gen.to.*``' intrinsics convert a pointer in the generic
158 address space to a pointer in the target address space. Note that these
159 intrinsics are only useful if the address space of the target address space of
160 the pointer is known. It is not legal to use address space conversion
161 intrinsics to convert a pointer from one non-generic address space to another
162 non-generic address space.
163
164 Semantics:
165 """"""""""
166
167 These intrinsics modify the pointer value to be a valid pointer in the target
168 non-generic address space.
169
170
171 Reading PTX Special Registers
172 -----------------------------
173
174 '``llvm.nvvm.read.ptx.sreg.*``'
175 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
176
177 Syntax:
178 """""""
179
180 .. code-block:: llvm
181
182 declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
183 declare i32 @llvm.nvvm.read.ptx.sreg.tid.y()
184 declare i32 @llvm.nvvm.read.ptx.sreg.tid.z()
185 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
186 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.y()
187 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.z()
188 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
189 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
190 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.z()
191 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
192 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
193 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
194 declare i32 @llvm.nvvm.read.ptx.sreg.warpsize()
195
196 Overview:
197 """""""""
198
199 The '``@llvm.nvvm.read.ptx.sreg.*``' intrinsics provide access to the PTX
200 special registers, in particular the kernel launch bounds. These registers
201 map in the following way to CUDA builtins:
202
203 ============ =====================================
204 CUDA Builtin PTX Special Register Intrinsic
205 ============ =====================================
206 ``threadId`` ``@llvm.nvvm.read.ptx.sreg.tid.*``
207 ``blockIdx`` ``@llvm.nvvm.read.ptx.sreg.ctaid.*``
208 ``blockDim`` ``@llvm.nvvm.read.ptx.sreg.ntid.*``
209 ``gridDim`` ``@llvm.nvvm.read.ptx.sreg.nctaid.*``
210 ============ =====================================
211
212
213 Barriers
214 --------
215
216 '``llvm.nvvm.barrier0``'
217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
218
219 Syntax:
220 """""""
221
222 .. code-block:: llvm
223
224 declare void @llvm.nvvm.barrier0()
225
226 Overview:
227 """""""""
228
229 The '``@llvm.nvvm.barrier0()``' intrinsic emits a PTX ``bar.sync 0``
230 instruction, equivalent to the ``__syncthreads()`` call in CUDA.
231
232
233 Other Intrinsics
234 ----------------
235
236 For the full set of NVPTX intrinsics, please see the
237 ``include/llvm/IR/IntrinsicsNVVM.td`` file in the LLVM source tree.
238
239
240 Executing PTX
241 =============
242
243 The most common way to execute PTX assembly on a GPU device is to use the CUDA
244 Driver API. This API is a low-level interface to the GPU driver and allows for
245 JIT compilation of PTX code to native GPU machine code.
246
247 Initializing the Driver API:
248
249 .. code-block:: c++
250
251 CUdevice device;
252 CUcontext context;
253
254 // Initialize the driver API
255 cuInit(0);
256 // Get a handle to the first compute device
257 cuDeviceGet(&device, 0);
258 // Create a compute device context
259 cuCtxCreate(&context, 0, device);
260
261 JIT compiling a PTX string to a device binary:
262
263 .. code-block:: c++
264
265 CUmodule module;
266 CUfunction funcion;
267
268 // JIT compile a null-terminated PTX string
269 cuModuleLoadData(&module, (void*)PTXString);
270
271 // Get a handle to the "myfunction" kernel function
272 cuModuleGetFunction(&function, module, "myfunction");
273
274 For full examples of executing PTX assembly, please see the `CUDA Samples
275 `_ distribution.
223223 WritingAnLLVMPass
224224 TableGen/LangRef
225225 HowToUseAttributes
226 NVPTXUsage
226227
227228 :doc:`WritingAnLLVMPass`
228229 Information on how to write LLVM transformations and analyses.
290291
291292 :doc:`HowToUseAttributes`
292293 Answers some questions about the new Attributes infrastructure.
294
295 :doc:`NVPTXUsage`
296 This document describes using the NVPTX back-end to compile GPU kernels.
297
293298
294299 Development Process Documentation
295300 =================================