llvm.org GIT mirror llvm / 50d6e67
[AMDGPU] gfx10 documentation update. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363332 91177308-0d34-0410-b5e6-96231b3b80d8 Stanislav Mekhanoshin a month ago
1 changed file(s) with 2428 addition(s) and 1286 deletion(s). Raw diff Collapse all Expand all
7777 .. table:: AMDGPU Processors
7878 :name: amdgpu-processor-table
7979
80 =========== =============== ============ ===== ========== ======= ======================
81 Processor Alternative Target dGPU/ Target ROCm Example
82 Processor Triple APU Features Support Products
80 =========== =============== ============ ===== ================= ======= ======================
81 Processor Alternative Target dGPU/ Target ROCm Example
82 Processor Triple APU Features Support Products
8383 Architecture Supported
8484 [Default]
85 =========== =============== ============ ===== ========== ======= ======================
85 =========== =============== ============ ===== ================= ======= ======================
8686 **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
87 ----------------------------------------------------------------------------------------
87 -----------------------------------------------------------------------------------------------
8888 ``r600`` ``r600`` dGPU
8989 ``r630`` ``r600`` dGPU
9090 ``rs880`` ``r600`` dGPU
9191 ``rv670`` ``r600`` dGPU
9292 **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
93 ----------------------------------------------------------------------------------------
93 -----------------------------------------------------------------------------------------------
9494 ``rv710`` ``r600`` dGPU
9595 ``rv730`` ``r600`` dGPU
9696 ``rv770`` ``r600`` dGPU
9797 **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
98 ----------------------------------------------------------------------------------------
98 -----------------------------------------------------------------------------------------------
9999 ``cedar`` ``r600`` dGPU
100100 ``cypress`` ``r600`` dGPU
101101 ``juniper`` ``r600`` dGPU
102102 ``redwood`` ``r600`` dGPU
103103 ``sumo`` ``r600`` dGPU
104104 **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
105 ----------------------------------------------------------------------------------------
105 -----------------------------------------------------------------------------------------------
106106 ``barts`` ``r600`` dGPU
107107 ``caicos`` ``r600`` dGPU
108108 ``cayman`` ``r600`` dGPU
109109 ``turks`` ``r600`` dGPU
110110 **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
111 ----------------------------------------------------------------------------------------
111 -----------------------------------------------------------------------------------------------
112112 ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU
113113 ``gfx601`` - ``hainan`` ``amdgcn`` dGPU
114114 - ``oland``
115115 - ``pitcairn``
116116 - ``verde``
117117 **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
118 ----------------------------------------------------------------------------------------
119 ``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000
120 - A6 Pro-7050B
121 - A8-7100
122 - A8 Pro-7150B
123 - A10-7300
124 - A10 Pro-7350B
125 - FX-7500
126 - A8-7200P
127 - A10-7400P
128 - FX-7600P
129 ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100
130 - FirePro W9100
131 - FirePro S9150
132 - FirePro S9170
133 ``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290
134 - Radeon R9 290x
135 - Radeon R390
136 - Radeon R390x
137 ``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100
138 - ``mullins`` - E1-2200
139 - E1-2500
140 - E2-3000
141 - E2-3800
142 - A4-5000
143 - A4-5100
144 - A6-5200
145 - A4 Pro-3340B
146 ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790
147 - Radeon HD 8770
148 - R7 260
149 - R7 260X
118 -----------------------------------------------------------------------------------------------
119 ``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000
120 - A6 Pro-7050B
121 - A8-7100
122 - A8 Pro-7150B
123 - A10-7300
124 - A10 Pro-7350B
125 - FX-7500
126 - A8-7200P
127 - A10-7400P
128 - FX-7600P
129 ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100
130 - FirePro W9100
131 - FirePro S9150
132 - FirePro S9170
133 ``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290
134 - Radeon R9 290x
135 - Radeon R390
136 - Radeon R390x
137 ``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100
138 - ``mullins`` - E1-2200
139 - E1-2500
140 - E2-3000
141 - E2-3800
142 - A4-5000
143 - A4-5100
144 - A6-5200
145 - A4 Pro-3340B
146 ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790
147 - Radeon HD 8770
148 - R7 260
149 - R7 260X
150150 **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
151 ----------------------------------------------------------------------------------------
152 ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P
153 [on] - Pro A6-8500B
154 - A8-8600P
155 - Pro A8-8600B
156 - FX-8800P
157 - Pro A12-8800B
158 \ ``amdgcn`` APU - xnack ROCm - A10-8700P
159 [on] - Pro A10-8700B
160 - A10-8780P
161 \ ``amdgcn`` APU - xnack - A10-9600P
162 [on] - A10-9630P
163 - A12-9700P
164 - A12-9730P
165 - FX-9800P
166 - FX-9830P
167 \ ``amdgcn`` APU - xnack - E2-9010
168 [on] - A6-9210
169 - A9-9410
170 ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150
171 - ``tonga`` [off] - FirePro S7100
172 - FirePro W7100
173 - Radeon R285
174 - Radeon R9 380
175 - Radeon R9 385
176 - Mobile FirePro
177 M7170
178 ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano
179 [off] - Radeon R9 Fury
180 - Radeon R9 FuryX
181 - Radeon Pro Duo
182 - FirePro S9300x2
183 - Radeon Instinct MI8
184 \ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470
185 [off] - Radeon RX 480
186 - Radeon Instinct MI6
187 \ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460
151 -----------------------------------------------------------------------------------------------
152 ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P
153 [on] - Pro A6-8500B
154 - A8-8600P
155 - Pro A8-8600B
156 - FX-8800P
157 - Pro A12-8800B
158 \ ``amdgcn`` APU - xnack ROCm - A10-8700P
159 [on] - Pro A10-8700B
160 - A10-8780P
161 \ ``amdgcn`` APU - xnack - A10-9600P
162 [on] - A10-9630P
163 - A12-9700P
164 - A12-9730P
165 - FX-9800P
166 - FX-9830P
167 \ ``amdgcn`` APU - xnack - E2-9010
168 [on] - A6-9210
169 - A9-9410
170 ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150
171 - ``tonga`` [off] - FirePro S7100
172 - FirePro W7100
173 - Radeon R285
174 - Radeon R9 380
175 - Radeon R9 385
176 - Mobile FirePro
177 M7170
178 ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano
179 [off] - Radeon R9 Fury
180 - Radeon R9 FuryX
181 - Radeon Pro Duo
182 - FirePro S9300x2
183 - Radeon Instinct MI8
184 \ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470
185 [off] - Radeon RX 480
186 - Radeon Instinct MI6
187 \ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460
188188 [off]
189189 ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack
190190 [on]
191191 **GCN GFX9** [AMD-GCN-GFX9]_
192 ----------------------------------------------------------------------------------------
193 ``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega
194 [off] Frontier Edition
195 - Radeon RX Vega 56
196 - Radeon RX Vega 64
197 - Radeon RX Vega 64
198 Liquid
199 - Radeon Instinct MI25
200 ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G
201 [on] - Ryzen 5 2400G
202 ``gfx904`` ``amdgcn`` dGPU - xnack *TBA*
192 -----------------------------------------------------------------------------------------------
193 ``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega
194 [off] Frontier Edition
195 - Radeon RX Vega 56
196 - Radeon RX Vega 64
197 - Radeon RX Vega 64
198 Liquid
199 - Radeon Instinct MI25
200 ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G
201 [on] - Ryzen 5 2400G
202 ``gfx904`` ``amdgcn`` dGPU - xnack *TBA*
203203 [off]
204 .. TODO
205 Add product
206 names.
207 ``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50
208 [off] - Radeon Instinct MI60
209 ``gfx909`` ``amdgcn`` APU - xnack *TBA* (Raven Ridge 2)
204 .. TODO
205 Add product
206 names.
207 ``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50
208 [off] - Radeon Instinct MI60
209 ``gfx909`` ``amdgcn`` APU - xnack *TBA* (Raven Ridge 2)
210210 [on]
211 .. TODO
212 Add product
213 names.
214 =========== =============== ============ ===== ========== ======= ======================
211 .. TODO
212 Add product
213 names.
214 **GCN GFX10** [AMD-GCN-GFX10]_
215 -----------------------------------------------------------------------------------------------
216 ``gfx1010`` ``amdgcn`` dGPU - xnack *TBA*
217 [off]
218 - wavefrontsize64
219 [off]
220 - cumode
221 [off]
222 .. TODO
223 Add product
224 names.
225 ``gfx1011`` ``amdgcn`` dGPU - xnack *TBA*
226 [off]
227 - wavefrontsize64
228 [off]
229 - cumode
230 [off]
231 .. TODO
232 Add product
233 names.
234 ``gfx1012`` ``amdgcn`` dGPU - xnack *TBA*
235 [off]
236 - wavefrontsize64
237 [off]
238 - cumode
239 [off]
240 .. TODO
241 Add product
242 names.
243 =========== =============== ============ ===== ================= ======= ======================
215244
216245 .. _amdgpu-target-features:
217246
242271 .. table:: AMDGPU Target Features
243272 :name: amdgpu-target-feature-table
244273
245 =============== ==================================================
246 Target Feature Description
247 =============== ==================================================
248 -m[no-]xnack Enable/disable generating code that has
249 memory clauses that are compatible with
250 having XNACK replay enabled.
251
252 This is used for demand paging and page
253 migration. If XNACK replay is enabled in
254 the device, then if a page fault occurs
255 the code may execute incorrectly if the
256 ``xnack`` feature is not enabled. Executing
257 code that has the feature enabled on a
258 device that does not have XNACK replay
259 enabled will execute correctly, but may
260 be less performant than code with the
261 feature disabled.
262 -m[no-]sram-ecc Enable/disable generating code that assumes SRAM
263 ECC is enabled/disabled.
264 =============== ==================================================
274 ====================== ==================================================
275 Target Feature Description
276 ====================== ==================================================
277 -m[no-]xnack Enable/disable generating code that has
278 memory clauses that are compatible with
279 having XNACK replay enabled.
280
281 This is used for demand paging and page
282 migration. If XNACK replay is enabled in
283 the device, then if a page fault occurs
284 the code may execute incorrectly if the
285 ``xnack`` feature is not enabled. Executing
286 code that has the feature enabled on a
287 device that does not have XNACK replay
288 enabled will execute correctly, but may
289 be less performant than code with the
290 feature disabled.
291
292 -m[no-]sram-ecc Enable/disable generating code that assumes SRAM
293 ECC is enabled/disabled.
294
295 -m[no-]wavefrontsize64 Control the default wavefront size used when
296 generating code for kernels. When disabled
297 native wavefront size 32 is used, when enabled
298 wavefront size 64 is used.
299
300 -m[no-]cumode Control the default wavefront execution mode used
301 when generating code for kernels. When disabled
302 native WGP wavefront execution mode is used,
303 when enabled CU wavefront execution mode is used
304 (see :ref:`amdgpu-amdhsa-memory-model`).
305 ====================== ==================================================
265306
266307 .. _amdgpu-address-spaces:
267308
634675 ``EF_AMDGPU_MACH_AMDGCN_GFX906`` 0x02f ``gfx906``
635676 *reserved* 0x030 Reserved.
636677 ``EF_AMDGPU_MACH_AMDGCN_GFX909`` 0x031 ``gfx909``
678 *reserved* 0x032 Reserved.
679 ``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033 ``gfx1010``
680 ``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034 ``gfx1011``
681 ``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035 ``gfx1012``
637682 ================================= ========== =============================
638683
639684 Sections
14911536 "NumSGPRs" integer Required Number of scalar
14921537 registers used by a
14931538 wavefront for
1494 GFX6-GFX9. This
1539 GFX6-GFX10. This
14951540 includes the special
14961541 SGPRs for VCC, Flat
1497 Scratch (GFX7-GFX9)
1542 Scratch (GFX7-GFX10)
14981543 and XNACK (for
1499 GFX8-GFX9). It does
1544 GFX8-GFX10). It does
15001545 not include the 16
15011546 SGPR added if a trap
15021547 handler is
15071552 "NumVGPRs" integer Required Number of vector
15081553 registers used by
15091554 each work-item for
1510 GFX6-GFX9
1555 GFX6-GFX10
15111556 "MaxFlatWorkGroupSize" integer Required Maximum flat
15121557 work-group size
15131558 supported by the
20592104 instructions, or by flat instructions. If each lane of a wavefront accesses the
20602105 same private address, the interleaving results in adjacent dwords being accessed
20612106 and hence requires fewer cache lines to be fetched. Multi-dword access is not
2062 supported except by flat and scratch instructions in GFX9.
2107 supported except by flat and scratch instructions in GFX9-GFX10.
20632108
20642109 The generic address space uses the hardware flat address support available in
2065 GFX7-GFX9. This uses two fixed ranges of virtual addresses (the private and
2110 GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and
20662111 local appertures), that are outside the range of addressible global memory, to
20672112 map from a flat address to a private or local address.
20682113
20772122 appertures address can be used. For GFX7-GFX8 these are available in the
20782123 :ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
20792124 Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
2080 GFX9 the appature base addresses are directly available as inline constant
2125 GFX9-GFX10 the appature base addresses are directly available as inline constant
20812126 registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
20822127 address mode the apperture sizes are 2^32 bytes and the base is aligned to 2^32
20832128 which makes it easier to convert from flat to segment or segment to flat.
21192164 execution of a kernel, including the entry point address of the machine code
21202165 that implements the kernel.
21212166
2122 Kernel Descriptor for GFX6-GFX9
2123 +++++++++++++++++++++++++++++++
2167 Kernel Descriptor for GFX6-GFX10
2168 ++++++++++++++++++++++++++++++++
21242169
21252170 CP microcode requires the Kernel descriptor to be allocated on 64 byte
21262171 alignment.
21272172
2128 .. table:: Kernel Descriptor for GFX6-GFX9
2129 :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table
2173 .. table:: Kernel Descriptor for GFX6-GFX10
2174 :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table
21302175
21312176 ======= ======= =============================== ============================
21322177 Bits Size Field Name Description
21562201 entry point instruction
21572202 which must be 256 byte
21582203 aligned.
2159 383:192 24 Reserved, must be 0.
2204 351:272 20 Reserved, must be 0.
21602205 bytes
2206 383:352 4 bytes COMPUTE_PGM_RSRC3 GFX6-9
2207 Reserved, must be 0.
2208 GFX10
2209 Compute Shader (CS)
2210 program settings used by
2211 CP to set up
2212 ``COMPUTE_PGM_RSRC3``
2213 configuration
2214 register. See
2215 :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`.
21612216 415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
21622217 program settings used by
21632218 CP to set up
21642219 ``COMPUTE_PGM_RSRC1``
21652220 configuration
21662221 register. See
2167 :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
2222 :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
21682223 447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
21692224 program settings used by
21702225 CP to set up
21712226 ``COMPUTE_PGM_RSRC2``
21722227 configuration
21732228 register. See
2174 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
2229 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
21752230 448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
21762231 _BUFFER SGPR user data registers
21772232 (see
21912246 453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above*
21922247 454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above*
21932248 _SIZE
2194 455 1 bit Reserved, must be 0.
2195 511:456 8 bytes Reserved, must be 0.
2249 457:455 3 bits Reserved, must be 0.
2250 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-9
2251 Reserved, must be 0.
2252 GFX10
2253 - If 0 execute in
2254 wavefront size 64 mode.
2255 - If 1 execute in
2256 native wavefront size
2257 32 mode.
2258 463:459 5 bits Reserved, must be 0.
2259 511:464 6 bytes Reserved, must be 0.
21962260 512 **Total size 64 bytes.**
21972261 ======= ====================================================================
21982262
21992263 ..
22002264
2201 .. table:: compute_pgm_rsrc1 for GFX6-GFX9
2202 :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table
2265 .. table:: compute_pgm_rsrc1 for GFX6-GFX10
2266 :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table
22032267
22042268 ======= ======= =============================== ===========================================================================
22052269 Bits Size Field Name Description
22122276 GFX6-GFX9
22132277 - vgprs_used 0..256
22142278 - max(0, ceil(vgprs_used / 4) - 1)
2279 GFX10 (wavefront size 64)
2280 - max_vgpr 1..256
2281 - max(0, ceil(vgprs_used / 4) - 1)
2282 GFX10 (wavefront size 32)
2283 - max_vgpr 1..256
2284 - max(0, ceil(vgprs_used / 8) - 1)
22152285
22162286 Where vgprs_used is defined
22172287 as the highest VGPR number
22432313 GFX9
22442314 - sgprs_used 0..112
22452315 - 2 * max(0, ceil(sgprs_used / 16) - 1)
2316 GFX10
2317 Reserved, must be 0.
2318 (128 SGPRs always
2319 allocated.)
22462320
22472321 Where sgprs_used is
22482322 defined as the highest
24062480 ``COMPUTE_PGM_RSRC1.CDBG_USER``.
24072481 26 1 bit FP16_OVFL GFX6-GFX8
24082482 Reserved, must be 0.
2409 GFX9
2483 GFX9-GFX10
24102484 Wavefront starts execution
24112485 with specified fp16 overflow
24122486 mode.
24222496
24232497 Used by CP to set up
24242498 ``COMPUTE_PGM_RSRC1.FP16_OVFL``.
2425 31:27 5 bits Reserved, must be 0.
2499 28:27 2 bits Reserved, must be 0.
2500 29 1 bit WGP_MODE GFX6-GFX9
2501 Reserved, must be 0.
2502 GFX10
2503 - If 0 execute work-groups in
2504 CU wavefront execution mode.
2505 - If 1 execute work-groups on
2506 in WGP wavefront execution mode.
2507
2508 See :ref:`amdgpu-amdhsa-memory-model`.
2509
2510 Used by CP to set up
2511 ``COMPUTE_PGM_RSRC1.WGP_MODE``.
2512 30 1 bit MEM_ORDERED GFX6-9
2513 Reserved, must be 0.
2514 GFX10
2515 Controls the behavior of the
2516 waitcnt's vmcnt and vscnt
2517 counters.
2518
2519 - If 0 vmcnt reports completion
2520 of load and atomic with return
2521 out of order with sample
2522 instructions, and the vscnt
2523 reports the completion of
2524 store and atomic without
2525 return in order.
2526 - If 1 vmcnt reports completion
2527 of load, atomic with return
2528 and sample instructions in
2529 order, and the vscnt reports
2530 the completion of store and
2531 atomic without return in order.
2532
2533 Used by CP to set up
2534 ``COMPUTE_PGM_RSRC1.MEM_ORDERED``.
2535 31 1 bit FWD_PROGRESS GFX6-9
2536 Reserved, must be 0.
2537 GFX10
2538 - If 0 execute SIMD wavefronts
2539 using oldest first policy.
2540 - If 1 execute SIMD wavefronts to
2541 ensure wavefronts will make some
2542 forward progress.
2543
2544 Used by CP to set up
2545 ``COMPUTE_PGM_RSRC1.FWD_PROGRESS``.
24262546 32 **Total size 4 bytes**
24272547 ======= ===================================================================================================================
24282548
24292549 ..
24302550
2431 .. table:: compute_pgm_rsrc2 for GFX6-GFX9
2432 :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table
2551 .. table:: compute_pgm_rsrc2 for GFX6-GFX10
2552 :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table
24332553
24342554 ======= ======= =============================== ===========================================================================
24352555 Bits Size Field Name Description
25482668
25492669 GFX6:
25502670 roundup(lds-size / (64 * 4))
2551 GFX7-GFX9:
2671 GFX7-GFX10:
25522672 roundup(lds-size / (128 * 4))
25532673
25542674 24 1 bit ENABLE_EXCEPTION_IEEE_754_FP Wavefront starts execution
25762696 _ZERO (rcp_iflag_f32 instruction
25772697 only)
25782698 31 1 bit Reserved, must be 0.
2699 32 **Total size 4 bytes.**
2700 ======= ===================================================================================================================
2701
2702 ..
2703
2704 .. table:: compute_pgm_rsrc3 for GFX10
2705 :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table
2706
2707 ======= ======= =============================== ===========================================================================
2708 Bits Size Field Name Description
2709 ======= ======= =============================== ===========================================================================
2710 3:0 4 bits SHARED_VGPR_COUNT Number of shared VGPRs for wavefront size 64. Granularity 8. Value 0-120.
2711 compute_pgm_rsrc1.vgprs + shared_vgpr_cnt cannot exceed 64.
2712 31:4 28 Reserved, must be 0.
2713 bits
25792714 32 **Total size 4 bytes.**
25802715 ======= ===================================================================================================================
25812716
27482883 it once avoids loading it at
27492884 the beginning of every
27502885 wavefront.
2751 GFX9
2886 GFX9-GFX10
27522887 This is the
27532888 64 bit base address of the
27542889 per SPI scratch backing
27862921 GFX7-GFX8 since it is the same
27872922 value as the second SGPR of
27882923 Flat Scratch Init. However, it
2789 may be needed for GFX9 which
2924 may be needed for GFX9-GFX10 which
27902925 changes the meaning of the
27912926 Flat Scratch Init value.
27922927 then Grid Work-Group Count X 1 32 bit count of the number of
28883023 value to the hardware required SGPRn-3 and SGPRn-4 respectively.
28893024
28903025 The global segment can be accessed either using buffer instructions (GFX6 which
2891 has V# 64 bit address support), flat instructions (GFX7-GFX9), or global
2892 instructions (GFX9).
3026 has V# 64 bit address support), flat instructions (GFX7-GFX10), or global
3027 instructions (GFX9-GFX10).
28933028
28943029 If buffer operations are used then the compiler can generate a V# with the
28953030 following properties:
29173052 available in dispatch packet. For M0, it is also possible to use maximum
29183053 possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
29193054 GFX7-GFX8).
2920 GFX9
3055 GFX9-GFX10
29213056 The M0 register is not used for range checking LDS accesses and so does not
29223057 need to be initialized in the prolog.
29233058
29503085 wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT SCRATCH
29513086 SIZE.
29523087
2953 GFX9
3088 GFX9-GFX10
29543089 The Flat Scratch Init is the 64 bit address of the base of scratch backing
29553090 memory being managed by SPI for the queue executing the kernel dispatch. The
29563091 prolog must add the value of Scratch Wavefront Offset and moved to the FLAT_SCRATCH
29713106 :ref:`amdgpu-memory-scopes`.
29723107
29733108 The code sequences used to implement the memory model are defined in table
2974 :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
3109 :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table`.
29753110
29763111 The sequences specify the order of instructions that a single thread must
29773112 execute. The ``s_waitcnt`` and ``buffer_wbinvl1_vol`` are defined with respect
30093144
30103145 For GFX6-GFX9:
30113146
3012 * Each agent has multiple compute units (CU).
3147 * Each agent has multiple shader arrays (SA).
3148 * Each SA has multiple compute units (CU).
30133149 * Each CU has multiple SIMDs that execute wavefronts.
30143150 * The wavefronts for a single work-group are executed in the same CU but may be
30153151 executed by different SIMDs.
30553191 * The L2 cache can be kept coherent with other agents on some targets, or ranges
30563192 of virtual addresses can be set up to bypass it to ensure system coherence.
30573193
3194 For GFX10:
3195
3196 * Each agent has multiple shader arrays (SA).
3197 * Each SA has multiple work-group processors (WGP).
3198 * Each WGP has multiple compute units (CU).
3199 * Each CU has multiple SIMDs that execute wavefronts.
3200 * The wavefronts for a single work-group are executed in the same
3201 WGP. In CU wavefront execution mode the wavefronts may be executed by
3202 different SIMDs in the same CU. In WGP wavefront execution mode the
3203 wavefronts may be executed by different SIMDs in different CUs in the same
3204 WGP.
3205 * Each WGP has a single LDS memory shared by the wavefronts of the work-groups
3206 executing on it.
3207 * All LDS operations of a WGP are performed as wavefront wide operations in a
3208 global order and involve no caching. Completion is reported to a wavefront in
3209 execution order.
3210 * The LDS memory has multiple request queues shared by the SIMDs of a
3211 WGP. Therefore, the LDS operations performed by different wavefronts of a work-group
3212 can be reordered relative to each other, which can result in reordering the
3213 visibility of vector memory operations with respect to LDS operations of other
3214 wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
3215 ensure synchronization between LDS operations and vector memory operations
3216 between wavefronts of a work-group, but not between operations performed by the
3217 same wavefront.
3218 * The vector memory operations are performed as wavefront wide operations.
3219 Completion of load/store/sample operations are reported to a wavefront in
3220 execution order of other load/store/sample operations performed by that
3221 wavefront.
3222 * The vector memory operations access a vector L0 cache. There is a single L0
3223 cache per CU. Each SIMD of a CU accesses the same L0 cache.
3224 Therefore, no special action is required for coherence between the lanes of a
3225 single wavefront. However, a ``BUFFER_GL0_INV`` is required for coherence
3226 between wavefronts executing in the same work-group as they may be executing on
3227 SIMDs of different CUs that access different L0s. A ``BUFFER_GL0_INV`` is also
3228 required for coherence between wavefronts executing in different work-groups as
3229 they may be executing on different WGPs.
3230 * The scalar memory operations access a scalar L0 cache shared by all wavefronts
3231 on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
3232 operations are used in a restricted way so do not impact the memory model. See
3233 :ref:`amdgpu-amdhsa-memory-spaces`.
3234 * The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
3235 the same SA. Therefore, no special action is required for coherence between
3236 the wavefronts of a single work-group. However, a ``BUFFER_GL1_INV`` is
3237 required for coherence between wavefronts executing in different work-groups as
3238 they may be executing on different SAs that access different L1s.
3239 * The L1 caches have independent quadrants to service disjoint ranges of virtual
3240 addresses.
3241 * Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
3242 vector and scalar memory operations performed by different wavefronts, whether
3243 executing in the same or different work-groups (which may be executing on
3244 different CUs accessing different L0s), can be reordered relative to each
3245 other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is required to ensure synchronization
3246 between vector memory operations of different wavefronts. It ensures a previous
3247 vector memory operation has completed before executing a subsequent vector
3248 memory or LDS operation and so can be used to meet the requirements of acquire,
3249 release and sequential consistency.
3250 * The L1 caches use an L2 cache shared by all SAs on the same agent.
3251 * The L2 cache has independent channels to service disjoint ranges of virtual
3252 addresses.
3253 * Each L1 quadrant of a single SA accesses a different L2 channel. Each L1
3254 quadrant has a separate request queue per L2 channel. Therefore, the vector
3255 and scalar memory operations performed by wavefronts executing in different
3256 work-groups (which may be executing on different SAs) of an agent can be
3257 reordered relative to each other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is
3258 required to ensure synchronization between vector memory operations of
3259 different SAs. It ensures a previous vector memory operation has completed
3260 before executing a subsequent vector memory and so can be used to meet the
3261 requirements of acquire, release and sequential consistency.
3262 * The L2 cache can be kept coherent with other agents on some targets, or ranges
3263 of virtual addresses can be set up to bypass it to ensure system coherence.
3264
30583265 Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-GFX8),
3059 or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
3266 or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread is accessing the
30603267 memory, atomic memory orderings are not meaningful and all accesses are treated
30613268 as non-atomic.
30623269
30993306 frame at the same address, respectively. There is no need for a ``s_dcache_inv``
31003307 as all scalar writes are write-before-read in the same thread.
31013308
3102 Scratch backing memory (which is used for the private address space)
3309 For GFX6-GFX9, scratch backing memory (which is used for the private address space)
31033310 is accessed with MTYPE NC_NV (non-coherenent non-volatile). Since the private
31043311 address space is only accessed by a single thread, and is always
31053312 write-before-read, there is never a need to invalidate these entries from the L1
31063313 cache. Hence all cache invalidates are done as ``*_vol`` to only invalidate the
31073314 volatile cache lines.
31083315
3316 For GFX10, scratch backing memory (which is used for the private address space)
3317 is accessed with MTYPE NC (non-coherenent). Since the private address space is
3318 only accessed by a single thread, and is always write-before-read, there is
3319 never a need to invalidate these entries from the L0 or L1 caches.
3320
3321 For GFX10, wavefronts are executed in native mode with in-order reporting of loads
3322 and sample instructions. In this mode vmcnt reports completion of load, atomic
3323 with return and sample instructions in order, and the vscnt reports the
3324 completion of store and atomic without return in order. See ``MEM_ORDERED`` field
3325 in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
3326
3327 In GFX10, wavefronts can be executed in WGP or CU wavefront execution mode:
3328
3329 * In WGP wavefront execution mode the wavefronts of a work-group are executed
3330 on the SIMDs of both CUs of the WGP. Therefore, explicit management of the per
3331 CU L0 caches is required for work-group synchronization. Also accesses to L1 at
3332 work-group scope need to be expicitly ordered as the accesses from different
3333 CUs are not ordered.
3334 * In CU wavefront execution mode the wavefronts of a work-group are executed on
3335 the SIMDs of a single CU of the WGP. Therefore, all global memory access by
3336 the work-group access the same L0 which in turn ensures L1 accesses are
3337 ordered and so do not require explicit management of the caches for
3338 work-group synchronization.
3339
3340 See ``WGP_MODE`` field in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`
3341 and :ref:`amdgpu-target-features`.
3342
31093343 On dGPU the kernarg backing memory is accessed as UC (uncached) to avoid needing
3110 to invalidate the L2 cache. This also causes it to be treated as
3344 to invalidate the L2 cache. For GFX6-GFX9, this also causes it to be treated as
31113345 non-volatile and so is not invalidated by ``*_vol``. On APU it is accessed as CC
3112 (cache coherent) and so the L2 cache will coherent with the CPU and other
3346 (cache coherent) and so the L2 cache will be coherent with the CPU and other
31133347 agents.
31143348
3115 .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX9
3116 :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table
3117
3118 ============ ============ ============== ========== ===============================
3119 LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code
3120 Ordering Sync Scope Address
3349 .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX10
3350 :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table
3351
3352 ============ ============ ============== ========== =============================== ==================================
3353 LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code AMDGPU Machine Code
3354 Ordering Sync Scope Address GFX6-9 GFX10
31213355 Space
3122 ============ ============ ============== ========== ===============================
3356 ============ ============ ============== ========== =============================== ==================================
31233357 **Non-Atomic**
3124 -----------------------------------------------------------------------------------
3125 load *none* *none* - global - !volatile & !nontemporal
3358 ----------------------------------------------------------------------------------------------------------------------
3359 load *none* *none* - global - !volatile & !nontemporal - !volatile & !nontemporal
31263360 - generic
3127 - private 1. buffer/global/flat_load
3361 - private 1. buffer/global/flat_load 1. buffer/global/flat_load
31283362 - constant
3129 - volatile & !nontemporal
3130
3131 1. buffer/global/flat_load
3132 glc=1
3133
3134 - nontemporal
3135
3136 1. buffer/global/flat_load
3137 glc=1 slc=1
3138
3139 load *none* *none* - local 1. ds_load
3140 store *none* *none* - global - !nontemporal
3363 - volatile & !nontemporal - volatile & !nontemporal
3364
3365 1. buffer/global/flat_load 1. buffer/global/flat_load
3366 glc=1 glc=1 dlc=1
3367
3368 - nontemporal - nontemporal
3369
3370 1. buffer/global/flat_load 1. buffer/global/flat_load
3371 glc=1 slc=1 slc=1
3372
3373 load *none* *none* - local 1. ds_load 1. ds_load
3374 store *none* *none* - global - !nontemporal - !nontemporal
31413375 - generic
3142 - private 1. buffer/global/flat_store
3376 - private 1. buffer/global/flat_store 1. buffer/global/flat_store
31433377 - constant
3144 - nontemporal
3145
3146 1. buffer/global/flat_stote
3147 glc=1 slc=1
3148
3149 store *none* *none* - local 1. ds_store
3378 - nontemporal - nontemporal
3379
3380 1. buffer/global/flat_stote 1. buffer/global/flat_store
3381 glc=1 slc=1 slc=1
3382
3383 store *none* *none* - local 1. ds_store 1. ds_store
31503384 **Unordered Atomic**
3151 -----------------------------------------------------------------------------------
3152 load atomic unordered *any* *any* *Same as non-atomic*.
3153 store atomic unordered *any* *any* *Same as non-atomic*.
3154 atomicrmw unordered *any* *any* *Same as monotonic
3155 atomic*.
3385 ----------------------------------------------------------------------------------------------------------------------
3386 load atomic unordered *any* *any* *Same as non-atomic*. *Same as non-atomic*.
3387 store atomic unordered *any* *any* *Same as non-atomic*. *Same as non-atomic*.
3388 atomicrmw unordered *any* *any* *Same as monotonic *Same as monotonic
3389 atomic*. atomic*.
31563390 **Monotonic Atomic**
3157 -----------------------------------------------------------------------------------
3158 load atomic monotonic - singlethread - global 1. buffer/global/flat_load
3391 ----------------------------------------------------------------------------------------------------------------------
3392 load atomic monotonic - singlethread - global 1. buffer/global/flat_load 1. buffer/global/flat_load
31593393 - wavefront - generic
3160 - workgroup
3161 load atomic monotonic - singlethread - local 1. ds_load
3394 load atomic monotonic - workgroup - global 1. buffer/global/flat_load 1. buffer/global/flat_load
3395 - generic glc=1
3396
3397 - If CU wavefront execution mode, omit glc=1.
3398
3399 load atomic monotonic - singlethread - local 1. ds_load 1. ds_load
31623400 - wavefront
31633401 - workgroup
3164 load atomic monotonic - agent - global 1. buffer/global/flat_load
3165 - system - generic glc=1
3166 store atomic monotonic - singlethread - global 1. buffer/global/flat_store
3402 load atomic monotonic - agent - global 1. buffer/global/flat_load 1. buffer/global/flat_load
3403 - system - generic glc=1 glc=1 dlc=1
3404 store atomic monotonic - singlethread - global 1. buffer/global/flat_store 1. buffer/global/flat_store
31673405 - wavefront - generic
31683406 - workgroup
31693407 - agent
31703408 - system
3171 store atomic monotonic - singlethread - local 1. ds_store
3409 store atomic monotonic - singlethread - local 1. ds_store 1. ds_store
31723410 - wavefront
31733411 - workgroup
3174 atomicrmw monotonic - singlethread - global 1. buffer/global/flat_atomic
3412 atomicrmw monotonic - singlethread - global 1. buffer/global/flat_atomic 1. buffer/global/flat_atomic
31753413 - wavefront - generic
31763414 - workgroup
31773415 - agent
31783416 - system
3179 atomicrmw monotonic - singlethread - local 1. ds_atomic
3417 atomicrmw monotonic - singlethread - local 1. ds_atomic 1. ds_atomic
31803418 - wavefront
31813419 - workgroup
31823420 **Acquire Atomic**
3183 -----------------------------------------------------------------------------------
3184 load atomic acquire - singlethread - global 1. buffer/global/ds/flat_load
3421 ----------------------------------------------------------------------------------------------------------------------
3422 load atomic acquire - singlethread - global 1. buffer/global/ds/flat_load 1. buffer/global/ds/flat_load
31853423 - wavefront - local
31863424 - generic
3187 load atomic acquire - workgroup - global 1. buffer/global/flat_load
3188 load atomic acquire - workgroup - local 1. ds_load
3189 2. s_waitcnt lgkmcnt(0)
3190
3191 - If OpenCL, omit.
3192 - Must happen before
3193 any following
3194 global/generic
3425 load atomic acquire - workgroup - global 1. buffer/global/flat_load 1. buffer/global_load glc=1
3426
3427 - If CU wavefront execution mode, omit glc=1.
3428
3429 2. s_waitcnt vmcnt(0)
3430
3431 - If CU wavefront execution mode, omit.
3432 - Must happen before
3433 the following buffer_gl0_inv
3434 and before any following
3435 global/generic
3436 load/load
3437 atomic/stote/store
3438 atomic/atomicrmw.
3439
3440 3. buffer_gl0_inv
3441
3442 - If CU wavefront execution mode, omit.
3443 - Ensures that
3444 following
3445 loads will not see
3446 stale data.
3447
3448 load atomic acquire - workgroup - local 1. ds_load 1. ds_load
3449 2. s_waitcnt lgkmcnt(0) 2. s_waitcnt lgkmcnt(0)
3450
3451 - If OpenCL, omit. - If OpenCL, omit.
3452 - Must happen before - Must happen before
3453 any following the following buffer_gl0_inv
3454 global/generic and before any following
3455 load/load global/generic load/load
3456 atomic/store/store atomic/store/store
3457 atomic/atomicrmw. atomic/atomicrmw.
3458 - Ensures any - Ensures any
3459 following global following global
3460 data read is no data read is no
3461 older than the load older than the load
3462 atomic value being atomic value being
3463 acquired. acquired.
3464
3465 3. buffer_gl0_inv
3466
3467 - If CU wavefront execution mode, omit.
3468 - If OpenCL, omit.
3469 - Ensures that
3470 following
3471 loads will not see
3472 stale data.
3473
3474 load atomic acquire - workgroup - generic 1. flat_load 1. flat_load glc=1
3475
3476 - If CU wavefront execution mode, omit glc=1.
3477
3478 2. s_waitcnt lgkmcnt(0) 2. s_waitcnt lgkmcnt(0) &
3479 vmcnt(0)
3480
3481 - If CU wavefront execution mode, omit vmcnt.
3482 - If OpenCL, omit. - If OpenCL, omit
3483 lgkmcnt(0).
3484 - Must happen before - Must happen before
3485 any following the following
3486 global/generic buffer_gl0_inv and any
3487 load/load following global/generic
3488 atomic/store/store load/load
3489 atomic/atomicrmw. atomic/store/store
3490 atomic/atomicrmw.
3491 - Ensures any - Ensures any
3492 following global following global
3493 data read is no data read is no
3494 older than the load older than the load
3495 atomic value being atomic value being
3496 acquired. acquired.
3497
3498 3. buffer_gl0_inv
3499
3500 - If CU wavefront execution mode, omit.
3501 - Ensures that
3502 following
3503 loads will not see
3504 stale data.
3505
3506 load atomic acquire - agent - global 1. buffer/global/flat_load 1. buffer/global_load
3507 - system glc=1 glc=1 dlc=1
3508 2. s_waitcnt vmcnt(0) 2. s_waitcnt vmcnt(0)
3509
3510 - Must happen before - Must happen before
3511 following following
3512 buffer_wbinvl1_vol. buffer_gl*_inv.
3513 - Ensures the load - Ensures the load
3514 has completed has completed
3515 before invalidating before invalidating
3516 the cache. the caches.
3517
3518 3. buffer_wbinvl1_vol 3. buffer_gl0_inv;
3519 buffer_gl1_inv
3520
3521 - Must happen before - Must happen before
3522 any following any following
3523 global/generic global/generic
3524 load/load load/load
3525 atomic/atomicrmw. atomic/atomicrmw.
3526 - Ensures that - Ensures that
3527 following following
3528 loads will not see loads will not see
3529 stale global data. stale global data.
3530
3531 load atomic acquire - agent - generic 1. flat_load glc=1 1. flat_load glc=1 dlc=1
3532 - system 2. s_waitcnt vmcnt(0) & 2. s_waitcnt vmcnt(0) &
3533 lgkmcnt(0) lgkmcnt(0)
3534
3535 - If OpenCL omit - If OpenCL omit
3536 lgkmcnt(0). lgkmcnt(0).
3537 - Must happen before - Must happen before
3538 following following
3539 buffer_wbinvl1_vol. buffer_gl*_invl.
3540 - Ensures the flat_load - Ensures the flat_load
3541 has completed has completed
3542 before invalidating before invalidating
3543 the cache. the caches.
3544
3545 3. buffer_wbinvl1_vol 3. buffer_gl0_inv;
3546 buffer_gl1_inv
3547
3548 - Must happen before - Must happen before
3549 any following any following
3550 global/generic global/generic
3551 load/load load/load
3552 atomic/atomicrmw. atomic/atomicrmw.
3553 - Ensures that - Ensures that
3554 following loads following loads
3555 will not see stale will not see stale
3556 global data. global data.
3557
3558 atomicrmw acquire - singlethread - global 1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
3559 - wavefront - local
3560 - generic
3561 atomicrmw acquire - workgroup - global 1. buffer/global/flat_atomic 1. buffer/global_atomic
3562 2. s_waitcnt vm/vscnt(0)
3563
3564 - If CU wavefront execution mode, omit.
3565 - Use vmcnt if atomic with
3566 return and vscnt if atomic
3567 with no-return.
3568 - Must happen before
3569 the following buffer_gl0_inv
3570 and before any following
3571 global/generic
3572 load/load
3573 atomic/stote/store
3574 atomic/atomicrmw.
3575
3576 3. buffer_gl0_inv
3577
3578 - If CU wavefront execution mode, omit.
3579 - Ensures that
3580 following
3581 loads will not see
3582 stale data.
3583
3584 atomicrmw acquire - workgroup - local 1. ds_atomic 1. ds_atomic
3585 2. waitcnt lgkmcnt(0) 2. waitcnt lgkmcnt(0)
3586
3587 - If OpenCL, omit. - If OpenCL, omit.
3588 - Must happen before - Must happen before
3589 any following the following
3590 global/generic buffer_gl0_inv.
31953591 load/load
31963592 atomic/store/store
31973593 atomic/atomicrmw.
3198 - Ensures any
3199 following global
3200 data read is no
3201 older than the load
3202 atomic value being
3203 acquired.
3204 load atomic acquire - workgroup - generic 1. flat_load
3205 2. s_waitcnt lgkmcnt(0)
3206
3207 - If OpenCL, omit.
3208 - Must happen before
3209 any following
3210 global/generic
3594 - Ensures any - Ensures any
3595 following global following global
3596 data read is no data read is no
3597 older than the older than the
3598 atomicrmw value atomicrmw value
3599 being acquired. being acquired.
3600
3601 3. buffer_gl0_inv
3602
3603 - If OpenCL omit.
3604 - Ensures that
3605 following
3606 loads will not see
3607 stale data.
3608
3609 atomicrmw acquire - workgroup - generic 1. flat_atomic 1. flat_atomic
3610 2. waitcnt lgkmcnt(0) 2. waitcnt lgkmcnt(0) &
3611 vm/vscnt(0)
3612
3613 - If CU wavefront execution mode, omit vm/vscnt.
3614 - If OpenCL, omit. - If OpenCL, omit
3615 waitcnt lgkmcnt(0)..
3616 - Use vmcnt if atomic with
3617 return and vscnt if atomic
3618 with no-return.
3619 waitcnt lgkmcnt(0).
3620 - Must happen before - Must happen before
3621 any following the following
3622 global/generic buffer_gl0_inv.
32113623 load/load
32123624 atomic/store/store
32133625 atomic/atomicrmw.
3214 - Ensures any
3215 following global
3216 data read is no
3217 older than the load
3218 atomic value being
3219 acquired.
3220 load atomic acquire - agent - global 1. buffer/global/flat_load
3221 - system glc=1
3222 2. s_waitcnt vmcnt(0)
3223
3224 - Must happen before
3225 following
3226 buffer_wbinvl1_vol.
3227 - Ensures the load
3228 has completed
3229 before invalidating
3230 the cache.
3231
3232 3. buffer_wbinvl1_vol
3233
3234 - Must happen before
3235 any following
3236 global/generic
3237 load/load
3238 atomic/atomicrmw.
3239 - Ensures that
3240 following
3241 loads will not see
3242 stale global data.
3243
3244 load atomic acquire - agent - generic 1. flat_load glc=1
3245 - system 2. s_waitcnt vmcnt(0) &
3246 lgkmcnt(0)
3247
3248 - If OpenCL omit
3249 lgkmcnt(0).
3250 - Must happen before
3251 following
3252 buffer_wbinvl1_vol.
3253 - Ensures the flat_load
3254 has completed
3255 before invalidating
3256 the cache.
3257
3258 3. buffer_wbinvl1_vol
3259
3260 - Must happen before
3261 any following
3262 global/generic
3263 load/load
3264 atomic/atomicrmw.
3265 - Ensures that
3266 following loads
3267 will not see stale
3268 global data.
3269
3270 atomicrmw acquire - singlethread - global 1. buffer/global/ds/flat_atomic
3271 - wavefront - local
3272 - generic
3273 atomicrmw acquire - workgroup - global 1. buffer/global/flat_atomic
3274 atomicrmw acquire - workgroup - local 1. ds_atomic
3275 2. waitcnt lgkmcnt(0)
3276
3277 - If OpenCL, omit.
3278 - Must happen before
3279 any following
3280 global/generic
3281 load/load
3282 atomic/store/store
3283 atomic/atomicrmw.
3284 - Ensures any
3285 following global
3286 data read is no
3287 older than the
3288 atomicrmw value
3289 being acquired.
3290
3291 atomicrmw acquire - workgroup - generic 1. flat_atomic
3292 2. waitcnt lgkmcnt(0)
3293
3294 - If OpenCL, omit.
3295 - Must happen before
3296 any following
3297 global/generic
3298 load/load
3299 atomic/store/store
3300 atomic/atomicrmw.
3301 - Ensures any
3302 following global
3303 data read is no
3304 older than the
3305 atomicrmw value
3306 being acquired.
3307
3308 atomicrmw acquire - agent - global 1. buffer/global/flat_atomic
3309 - system 2. s_waitcnt vmcnt(0)
3310
3311 - Must happen before
3312 following
3313 buffer_wbinvl1_vol.
3314 - Ensures the
3315 atomicrmw has
3316 completed before
3317 invalidating the
3318 cache.
3319
3320 3. buffer_wbinvl1_vol
3321
3322 - Must happen before
3323 any following
3324 global/generic
3325 load/load
3326 atomic/atomicrmw.
3327 - Ensures that
3328 following loads
3329 will not see stale
3330 global data.
3331
3332 atomicrmw acquire - agent - generic 1. flat_atomic
3333 - system 2. s_waitcnt vmcnt(0) &
3334 lgkmcnt(0)
3335
3336 - If OpenCL, omit
3337 lgkmcnt(0).
3338 - Must happen before
3339 following
3340 buffer_wbinvl1_vol.
3341 - Ensures the
3342 atomicrmw has
3343 completed before
3344 invalidating the
3345 cache.
3346
3347 3. buffer_wbinvl1_vol
3348
3349 - Must happen before
3350 any following
3351 global/generic
3352 load/load
3353 atomic/atomicrmw.
3354 - Ensures that
3355 following loads
3356 will not see stale
3357 global data.
3358
3359 fence acquire - singlethread *none* *none*
3626 - Ensures any - Ensures any
3627 following global following global
3628 data read is no data read is no
3629 older than the older than the
3630 atomicrmw value atomicrmw value
3631 being acquired. being acquired.
3632
3633 3. buffer_gl0_inv
3634
3635 - If CU wavefront execution mode, omit.
3636 - Ensures that
3637 following
3638 loads will not see
3639 stale data.
3640
3641 atomicrmw acquire - agent - global 1. buffer/global/flat_atomic 1. buffer/global_atomic
3642 - system 2. s_waitcnt vmcnt(0) 2. s_waitcnt vm/vscnt(0)
3643
3644 - Use vmcnt if atomic with
3645 return and vscnt if atomic
3646 with no-return.
3647 waitcnt lgkmcnt(0).
3648 - Must happen before - Must happen before
3649 following following
3650 buffer_wbinvl1_vol. buffer_gl*_inv.
3651 - Ensures the - Ensures the
3652 atomicrmw has atomicrmw has
3653 completed before completed before
3654 invalidating the invalidating the
3655 cache. caches.
3656
3657 3. buffer_wbinvl1_vol 3. buffer_gl0_inv;
3658 buffer_gl1_inv
3659
3660 - Must happen before - Must happen before
3661 any following any following
3662 global/generic global/generic
3663 load/load load/load
3664 atomic/atomicrmw. atomic/atomicrmw.
3665 - Ensures that - Ensures that
3666 following loads following loads
3667 will not see stale will not see stale
3668 global data. global data.
3669
3670 atomicrmw acquire - agent - generic 1. flat_atomic 1. flat_atomic
3671 - system 2. s_waitcnt vmcnt(0) & 2. s_waitcnt vm/vscnt(0) &
3672 lgkmcnt(0) lgkmcnt(0)
3673
3674 - If OpenCL, omit - If OpenCL, omit
3675 lgkmcnt(0). lgkmcnt(0).
3676 - Use vmcnt if atomic with
3677 return and vscnt if atomic
3678 with no-return.
3679 - Must happen before - Must happen before
3680 following following
3681 buffer_wbinvl1_vol. buffer_gl*_inv.
3682 - Ensures the - Ensures the
3683 atomicrmw has atomicrmw has
3684 completed before completed before
3685 invalidating the invalidating the
3686 cache. caches.
3687
3688 3. buffer_wbinvl1_vol 3. buffer_gl0_inv;
3689 buffer_gl1_inv
3690
3691 - Must happen before - Must happen before
3692 any following any following
3693 global/generic global/generic
3694 load/load load/load
3695 atomic/atomicrmw. atomic/atomicrmw.
3696 - Ensures that - Ensures that
3697 following loads following loads
3698 will not see stale will not see stale
3699 global data. global data.
3700
3701 fence acquire - singlethread *none* *none* *none*
33603702 - wavefront
3361 fence acquire - workgroup *none* 1. s_waitcnt lgkmcnt(0)
3362
3363 - If OpenCL and
3364 address space is
3365 not generic, omit.
3366 - However, since LLVM
3367 currently has no
3368 address space on
3369 the fence need to
3370 conservatively
3371 always generate. If
3372 fence had an
3373 address space then
3374 set to address
3375 space of OpenCL
3376 fence flag, or to
3377 generic if both
3378 local and global
3379 flags are
3380 specified.
3703 fence acquire - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
3704 vmcnt(0) & vscnt(0)
3705
3706 - If CU wavefront execution mode, omit vmcnt and
3707 vscnt.
3708 - If OpenCL and - If OpenCL and
3709 address space is address space is
3710 not generic, omit. not generic, omit
3711 lgkmcnt(0).
3712 - If OpenCL and
3713 address space is
3714 local, omit
3715 vmcnt(0) and vscnt(0).
3716 - However, since LLVM - However, since LLVM
3717 currently has no currently has no
3718 address space on address space on
3719 the fence need to the fence need to
3720 conservatively conservatively
3721 always generate. If always generate. If
3722 fence had an fence had an
3723 address space then address space then
3724 set to address set to address
3725 space of OpenCL space of OpenCL
3726 fence flag, or to fence flag, or to
3727 generic if both generic if both
3728 local and global local and global
3729 flags are flags are
3730 specified. specified.
33813731 - Must happen after
33823732 any preceding
33833733 local/generic load
34013751 older than the
34023752 value read by the
34033753 fence-paired-atomic.
3404
3405 fence acquire - agent *none* 1. s_waitcnt lgkmcnt(0) &
3406 - system vmcnt(0)
3407
3408 - If OpenCL and
3409 address space is
3410 not generic, omit
3411 lgkmcnt(0).
3412 - However, since LLVM
3413 currently has no
3414 address space on
3415 the fence need to
3416 conservatively
3417 always generate
3418 (see comment for
3419 previous fence).
3754 - Could be split into
3755 separate s_waitcnt
3756 vmcnt(0), s_waitcnt
3757 vscnt(0) and s_waitcnt
3758 lgkmcnt(0) to allow
3759 them to be
3760 independently moved
3761 according to the
3762 following rules.
3763 - s_waitcnt vmcnt(0)
3764 must happen after
3765 any preceding
3766 global/generic load
3767 atomic/
3768 atomicrmw-with-return-value
3769 with an equal or
3770 wider sync scope
3771 and memory ordering
3772 stronger than
3773 unordered (this is
3774 termed the
3775 fence-paired-atomic).
3776 - s_waitcnt vscnt(0)
3777 must happen after
3778 any preceding
3779 global/generic
3780 atomicrmw-no-return-value
3781 with an equal or
3782 wider sync scope
3783 and memory ordering
3784 stronger than
3785 unordered (this is
3786 termed the
3787 fence-paired-atomic).
3788 - s_waitcnt lgkmcnt(0)
3789 must happen after
3790 any preceding
3791 local/generic load
3792 atomic/atomicrmw
3793 with an equal or
3794 wider sync scope
3795 and memory ordering
3796 stronger than
3797 unordered (this is
3798 termed the
3799 fence-paired-atomic).
3800 - Must happen before
3801 the following
3802 buffer_gl0_inv.
3803 - Ensures that the
3804 fence-paired atomic
3805 has completed
3806 before invalidating
3807 the
3808 cache. Therefore
3809 any following
3810 locations read must
3811 be no older than
3812 the value read by
3813 the
3814 fence-paired-atomic.
3815
3816 3. buffer_gl0_inv
3817
3818 - If CU wavefront execution mode, omit.
3819 - Ensures that
3820 following
3821 loads will not see
3822 stale data.
3823
3824 fence acquire - agent *none* 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lgkmcnt(0) &
3825 - system vmcnt(0) vmcnt(0) & vscnt(0)
3826
3827 - If OpenCL and - If OpenCL and
3828 address space is address space is
3829 not generic, omit not generic, omit
3830 lgkmcnt(0). lgkmcnt(0).
3831 - If OpenCL and
3832 address space is
3833 local, omit
3834 vmcnt(0) and vscnt(0).
3835 - However, since LLVM - However, since LLVM
3836 currently has no currently has no
3837 address space on address space on
3838 the fence need to the fence need to
3839 conservatively conservatively
3840 always generate always generate
3841 (see comment for (see comment for
3842 previous fence). previous fence).
34203843 - Could be split into
34213844 separate s_waitcnt
34223845 vmcnt(0) and
34653888 the value read by
34663889 the
34673890 fence-paired-atomic.
3468
3469 2. buffer_wbinvl1_vol
3470
3471 - Must happen before any
3472 following global/generic
3473 load/load
3474 atomic/store/store
3475 atomic/atomicrmw.
3476 - Ensures that
3477 following loads
3478 will not see stale
3479 global data.
3891 - Could be split into
3892 separate s_waitcnt
3893 vmcnt(0), s_waitcnt
3894 vscnt(0) and s_waitcnt
3895 lgkmcnt(0) to allow
3896 them to be
3897 independently moved
3898 according to the
3899 following rules.
3900 - s_waitcnt vmcnt(0)
3901 must happen after
3902 any preceding
3903 global/generic load
3904 atomic/
3905 atomicrmw-with-return-value
3906 with an equal or
3907 wider sync scope
3908 and memory ordering
3909 stronger than
3910 unordered (this is
3911 termed the
3912 fence-paired-atomic).
3913 - s_waitcnt vscnt(0)
3914 must happen after
3915 any preceding
3916 global/generic
3917 atomicrmw-no-return-value
3918 with an equal or
3919 wider sync scope
3920 and memory ordering
3921 stronger than
3922 unordered (this is
3923 termed the
3924 fence-paired-atomic).
3925 - s_waitcnt lgkmcnt(0)
3926 must happen after
3927 any preceding
3928 local/generic load
3929 atomic/atomicrmw
3930 with an equal or
3931 wider sync scope
3932 and memory ordering
3933 stronger than
3934 unordered (this is
3935 termed the
3936 fence-paired-atomic).
3937 - Must happen before
3938 the following
3939 buffer_gl*_inv.
3940 - Ensures that the
3941 fence-paired atomic
3942 has completed
3943 before invalidating
3944 the
3945 caches. Therefore
3946 any following
3947 locations read must
3948 be no older than
3949 the value read by
3950 the
3951 fence-paired-atomic.
3952
3953 2. buffer_wbinvl1_vol 2. buffer_gl0_inv;
3954 buffer_gl1_inv
3955
3956 - Must happen before any - Must happen before any
3957 following global/generic following global/generic
3958 load/load load/load
3959 atomic/store/store atomic/store/store
3960 atomic/atomicrmw. atomic/atomicrmw.
3961 - Ensures that - Ensures that
3962 following loads following loads
3963 will not see stale will not see stale
3964 global data. global data.
34803965
34813966 **Release Atomic**
3482 -----------------------------------------------------------------------------------
3483 store atomic release - singlethread - global 1. buffer/global/ds/flat_store
3967 ----------------------------------------------------------------------------------------------------------------------
3968 store atomic release - singlethread - global 1. buffer/global/ds/flat_store 1. buffer/global/ds/flat_store
34843969 - wavefront - local
34853970 - generic
3486 store atomic release - workgroup - global 1. s_waitcnt lgkmcnt(0)
3487
3488 - If OpenCL, omit.
3971 store atomic release - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
3972 vmcnt(0) & vscnt(0)
3973
3974 - If CU wavefront execution mode, omit vmcnt and
3975 vscnt.
3976 - If OpenCL, omit. - If OpenCL, omit
3977 lgkmcnt(0).
34893978 - Must happen after
34903979 any preceding
34913980 local/generic
34923981 load/store/load
34933982 atomic/store
34943983 atomic/atomicrmw.
3495 - Must happen before
3496 the following
3497 store.
3498 - Ensures that all
3499 memory operations
3500 to local have
3501 completed before
3502 performing the
3503 store that is being
3504 released.
3505
3506 2. buffer/global/flat_store
3507 store atomic release - workgroup - local 1. ds_store
3508 store atomic release - workgroup - generic 1. s_waitcnt lgkmcnt(0)
3509
3510 - If OpenCL, omit.
3984 - Could be split into
3985 separate s_waitcnt
3986 vmcnt(0), s_waitcnt
3987 vscnt(0) and s_waitcnt
3988 lgkmcnt(0) to allow
3989 them to be
3990 independently moved
3991 according to the
3992 following rules.
3993 - s_waitcnt vmcnt(0)
3994 must happen after
3995 any preceding
3996 global/generic load/load
3997 atomic/
3998 atomicrmw-with-return-value.
3999 - s_waitcnt vscnt(0)
4000 must happen after
4001 any preceding
4002 global/generic
4003 store/store
4004 atomic/
4005 atomicrmw-no-return-value.
4006 - s_waitcnt lgkmcnt(0)
4007 must happen after
4008 any preceding
4009 local/generic
4010 load/store/load
4011 atomic/store
4012 atomic/atomicrmw.
4013 - Must happen before - Must happen before
4014 the following the following
4015 store. store.
4016 - Ensures that all - Ensures that all
4017 memory operations memory operations
4018 to local have have
4019 completed before completed before
4020 performing the performing the
4021 store that is being store that is being
4022 released. released.
4023
4024 2. buffer/global/flat_store 2. buffer/global_store
4025 store atomic release - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
4026
4027 - If CU wavefront execution mode, omit.
4028 - If OpenCL, omit.
4029 - Could be split into
4030 separate s_waitcnt
4031 vmcnt(0) and s_waitcnt
4032 vscnt(0) to allow
4033 them to be
4034 independently moved
4035 according to the
4036 following rules.
4037 - s_waitcnt vmcnt(0)
4038 must happen after
4039 any preceding
4040 global/generic load/load
4041 atomic/
4042 atomicrmw-with-return-value.
4043 - s_waitcnt vscnt(0)
4044 must happen after
4045 any preceding
4046 global/generic
4047 store/store atomic/
4048 atomicrmw-no-return-value.
4049 - Must happen before
4050 the following
4051 store.
4052 - Ensures that all
4053 global memory
4054 operations have
4055 completed before
4056 performing the
4057 store that is being
4058 released.
4059
4060 1. ds_store 2. ds_store
4061 store atomic release - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4062 vmcnt(0) & vscnt(0)
4063
4064 - If CU wavefront execution mode, omit vmcnt and
4065 vscnt.
4066 - If OpenCL, omit. - If OpenCL, omit
4067 lgkmcnt(0).
35114068 - Must happen after
35124069 any preceding
35134070 local/generic
35144071 load/store/load
35154072 atomic/store
35164073 atomic/atomicrmw.
3517 - Must happen before
3518 the following
3519 store.
3520 - Ensures that all
3521 memory operations
3522 to local have
3523 completed before
3524 performing the
3525 store that is being
3526 released.
3527
3528 2. flat_store
3529 store atomic release - agent - global 1. s_waitcnt lgkmcnt(0) &
3530 - system - generic vmcnt(0)
3531
3532 - If OpenCL, omit
3533 lgkmcnt(0).
3534 - Could be split into
3535 separate s_waitcnt
3536 vmcnt(0) and
3537 s_waitcnt
3538 lgkmcnt(0) to allow
3539 them to be
3540 independently moved
3541 according to the
3542 following rules.
3543 - s_waitcnt vmcnt(0)
3544 must happen after
3545 any preceding
3546 global/generic
3547 load/store/load
3548 atomic/store
3549 atomic/atomicrmw.
3550 - s_waitcnt lgkmcnt(0)
3551 must happen after
3552 any preceding
3553 local/generic
3554 load/store/load
3555 atomic/store
3556 atomic/atomicrmw.
3557 - Must happen before
3558 the following
3559 store.
3560 - Ensures that all
3561 memory operations
3562 to memory have
3563 completed before
3564 performing the
3565 store that is being
3566 released.
3567
3568 2. buffer/global/ds/flat_store
3569 atomicrmw release - singlethread - global 1. buffer/global/ds/flat_atomic
4074 - Could be split into
4075 separate s_waitcnt
4076 vmcnt(0), s_waitcnt
4077 vscnt(0) and s_waitcnt
4078 lgkmcnt(0) to allow
4079 them to be
4080 independently moved
4081 according to the
4082 following rules.
4083 - s_waitcnt vmcnt(0)
4084 must happen after
4085 any preceding
4086 global/generic load/load
4087 atomic/
4088 atomicrmw-with-return-value.
4089 - s_waitcnt vscnt(0)
4090 must happen after
4091 any preceding
4092 global/generic
4093 store/store
4094 atomic/
4095 atomicrmw-no-return-value.
4096 - s_waitcnt lgkmcnt(0)
4097 must happen after
4098 any preceding
4099 local/generic load/store/load
4100 atomic/store atomic/atomicrmw.
4101 - Must happen before - Must happen before
4102 the following the following
4103 store. store.
4104 - Ensures that all - Ensures that all
4105 memory operations memory operations
4106 to local have have
4107 completed before completed before
4108 performing the performing the
4109 store that is being store that is being
4110 released. released.
4111
4112 2. flat_store 2. flat_store
4113 store atomic release - agent - global 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lgkmcnt(0) &
4114 - system - generic vmcnt(0) vmcnt(0) & vscnt(0)
4115
4116 - If OpenCL, omit - If OpenCL, omit
4117 lgkmcnt(0). lgkmcnt(0).
4118 - Could be split into - Could be split into
4119 separate s_waitcnt separate s_waitcnt
4120 vmcnt(0) and vmcnt(0), s_waitcnt vscnt(0)
4121 s_waitcnt and s_waitcnt
4122 lgkmcnt(0) to allow lgkmcnt(0) to allow
4123 them to be them to be
4124 independently moved independently moved
4125 according to the according to the
4126 following rules. following rules.
4127 - s_waitcnt vmcnt(0) - s_waitcnt vmcnt(0)
4128 must happen after must happen after
4129 any preceding any preceding
4130 global/generic global/generic
4131 load/store/load load/load
4132 atomic/store atomic/
4133 atomic/atomicrmw. atomicrmw-with-return-value.
4134 - s_waitcnt vscnt(0)
4135 must happen after
4136 any preceding
4137 global/generic
4138 store/store atomic/
4139 atomicrmw-no-return-value.
4140 - s_waitcnt lgkmcnt(0) - s_waitcnt lgkmcnt(0)
4141 must happen after must happen after
4142 any preceding any preceding
4143 local/generic local/generic
4144 load/store/load load/store/load
4145 atomic/store atomic/store
4146 atomic/atomicrmw. atomic/atomicrmw.
4147 - Must happen before - Must happen before
4148 the following the following
4149 store. store.
4150 - Ensures that all - Ensures that all
4151 memory operations memory operations
4152 to memory have to memory have
4153 completed before completed before
4154 performing the performing the
4155 store that is being store that is being
4156 released. released.
4157
4158 2. buffer/global/ds/flat_store 2. buffer/global/ds/flat_store
4159 atomicrmw release - singlethread - global 1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
35704160 - wavefront - local
35714161 - generic
3572 atomicrmw release - workgroup - global 1. s_waitcnt lgkmcnt(0)
3573
4162 atomicrmw release - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4163 vmcnt(0) & vscnt(0)
4164
4165 - If CU wavefront execution mode, omit vmcnt and
4166 vscnt.
35744167 - If OpenCL, omit.
4168
35754169 - Must happen after
35764170 any preceding
35774171 local/generic
35784172 load/store/load
35794173 atomic/store
35804174 atomic/atomicrmw.
3581 - Must happen before
3582 the following
3583 atomicrmw.
3584 - Ensures that all
3585 memory operations
3586 to local have
3587 completed before
3588 performing the
3589 atomicrmw that is
3590 being released.
3591
3592 2. buffer/global/flat_atomic
3593 atomicrmw release - workgroup - local 1. ds_atomic
3594 atomicrmw release - workgroup - generic 1. s_waitcnt lgkmcnt(0)
3595
3596 - If OpenCL, omit.
4175 - Could be split into
4176 separate s_waitcnt
4177 vmcnt(0), s_waitcnt
4178 vscnt(0) and s_waitcnt
4179 lgkmcnt(0) to allow
4180 them to be
4181 independently moved
4182 according to the
4183 following rules.
4184 - s_waitcnt vmcnt(0)
4185 must happen after
4186 any preceding
4187 global/generic load/load
4188 atomic/
4189 atomicrmw-with-return-value.
4190 - s_waitcnt vscnt(0)
4191 must happen after
4192 any preceding
4193 global/generic
4194 store/store
4195 atomic/
4196 atomicrmw-no-return-value.
4197 - s_waitcnt lgkmcnt(0)
4198 must happen after
4199 any preceding
4200 local/generic
4201 load/store/load
4202 atomic/store
4203 atomic/atomicrmw.
4204 - Must happen before - Must happen before
4205 the following the following
4206 atomicrmw. atomicrmw.
4207 - Ensures that all - Ensures that all
4208 memory operations memory operations
4209 to local have have
4210 completed before completed before
4211 performing the performing the
4212 atomicrmw that is atomicrmw that is
4213 being released. being released.
4214
4215 2. buffer/global/flat_atomic 2. buffer/global_atomic
4216 atomicrmw release - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
4217
4218 - If CU wavefront execution mode, omit.
4219 - If OpenCL, omit.
4220 - Could be split into
4221 separate s_waitcnt
4222 vmcnt(0) and s_waitcnt
4223 vscnt(0) to allow
4224 them to be
4225 independently moved
4226 according to the
4227 following rules.
4228 - s_waitcnt vmcnt(0)
4229 must happen after
4230 any preceding
4231 global/generic load/load
4232 atomic/
4233 atomicrmw-with-return-value.
4234 - s_waitcnt vscnt(0)
4235 must happen after
4236 any preceding
4237 global/generic
4238 store/store atomic/
4239 atomicrmw-no-return-value.
4240 - Must happen before
4241 the following
4242 store.
4243 - Ensures that all
4244 global memory
4245 operations have
4246 completed before
4247 performing the
4248 store that is being
4249 released.
4250
4251 1. ds_atomic 2. ds_atomic
4252 atomicrmw release - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4253 vmcnt(0) & vscnt(0)
4254
4255 - If CU wavefront execution mode, omit vmcnt and
4256 vscnt.
4257 - If OpenCL, omit. - If OpenCL, omit
4258 waitcnt lgkmcnt(0).
35974259 - Must happen after
35984260 any preceding
35994261 local/generic
36004262 load/store/load
36014263 atomic/store
36024264 atomic/atomicrmw.
3603 - Must happen before
3604 the following
3605 atomicrmw.
3606 - Ensures that all
3607 memory operations
3608 to local have
3609 completed before
3610 performing the
3611 atomicrmw that is
3612 being released.
3613
3614 2. flat_atomic
3615 atomicrmw release - agent - global 1. s_waitcnt lgkmcnt(0) &
3616 - system - generic vmcnt(0)
3617
3618 - If OpenCL, omit
3619 lgkmcnt(0).
3620 - Could be split into
3621 separate s_waitcnt
3622 vmcnt(0) and
3623 s_waitcnt
3624 lgkmcnt(0) to allow
3625 them to be
3626 independently moved
3627 according to the
3628 following rules.
3629 - s_waitcnt vmcnt(0)
3630 must happen after
3631 any preceding
3632 global/generic
3633 load/store/load
3634 atomic/store
4265 - Could be split into
4266 separate s_waitcnt
4267 vmcnt(0), s_waitcnt
4268 vscnt(0) and s_waitcnt
4269 lgkmcnt(0) to allow
4270 them to be
4271 independently moved
4272 according to the
4273 following rules.
4274 - s_waitcnt vmcnt(0)
4275 must happen after
4276 any preceding
4277 global/generic load/load
4278 atomic/
4279 atomicrmw-with-return-value.
4280 - s_waitcnt vscnt(0)
4281 must happen after
4282 any preceding
4283 global/generic
4284 store/store
4285 atomic/
4286 atomicrmw-no-return-value.
4287 - s_waitcnt lgkmcnt(0)
4288 must happen after
4289 any preceding
4290 local/generic load/store/load
4291 atomic/store atomic/atomicrmw.
4292 - Must happen before - Must happen before
4293 the following the following
4294 atomicrmw. atomicrmw.
4295 - Ensures that all - Ensures that all
4296 memory operations memory operations
4297 to local have have
4298 completed before completed before
4299 performing the performing the
4300 atomicrmw that is atomicrmw that is
4301 being released. being released.
4302
4303 2. flat_atomic 2. flat_atomic
4304 atomicrmw release - agent - global 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lkkmcnt(0) &
4305 - system - generic vmcnt(0) vmcnt(0) & vscnt(0)
4306
4307 - If OpenCL, omit - If OpenCL, omit
4308 lgkmcnt(0). lgkmcnt(0).
4309 - Could be split into - Could be split into
4310 separate s_waitcnt separate s_waitcnt
4311 vmcnt(0) and vmcnt(0), s_waitcnt
4312 s_waitcnt vscnt(0) and s_waitcnt
4313 lgkmcnt(0) to allow lgkmcnt(0) to allow
4314 them to be them to be
4315 independently moved independently moved
4316 according to the according to the
4317 following rules. following rules.
4318 - s_waitcnt vmcnt(0) - s_waitcnt vmcnt(0)
4319 must happen after must happen after
4320 any preceding any preceding
4321 global/generic global/generic
4322 load/store/load load/load atomic/
4323 atomic/store atomicrmw-with-return-value.
36354324 atomic/atomicrmw.
3636 - s_waitcnt lgkmcnt(0)
3637 must happen after
3638 any preceding
3639 local/generic
3640 load/store/load
3641 atomic/store
3642 atomic/atomicrmw.
3643 - Must happen before
3644 the following
3645 atomicrmw.
3646 - Ensures that all
3647 memory operations
3648 to global and local
3649 have completed
3650 before performing
3651 the atomicrmw that
3652 is being released.
3653
3654 2. buffer/global/ds/flat_atomic
3655 fence release - singlethread *none* *none*
4325 - s_waitcnt vscnt(0)
4326 must happen after
4327 any preceding
4328 global/generic
4329 store/store atomic/
4330 atomicrmw-no-return-value.
4331 - s_waitcnt lgkmcnt(0) - s_waitcnt lgkmcnt(0)
4332 must happen after must happen after
4333 any preceding any preceding
4334 local/generic local/generic
4335 load/store/load load/store/load
4336 atomic/store atomic/store
4337 atomic/atomicrmw. atomic/atomicrmw.
4338 - Must happen before - Must happen before
4339 the following the following
4340 atomicrmw. atomicrmw.
4341 - Ensures that all - Ensures that all
4342 memory operations memory operations
4343 to global and local to global and local
4344 have completed have completed
4345 before performing before performing
4346 the atomicrmw that the atomicrmw that
4347 is being released. is being released.
4348
4349 2. buffer/global/ds/flat_atomic 2. buffer/global/ds/flat_atomic
4350 fence release - singlethread *none* *none* *none*
36564351 - wavefront
3657 fence release - workgroup *none* 1. s_waitcnt lgkmcnt(0)
3658
3659 - If OpenCL and
3660 address space is
3661 not generic, omit.
3662 - However, since LLVM
3663 currently has no
3664 address space on
3665 the fence need to
3666 conservatively
3667 always generate. If
3668 fence had an
3669 address space then
3670 set to address
3671 space of OpenCL
3672 fence flag, or to
3673 generic if both
3674 local and global
3675 flags are
3676 specified.
4352 fence release - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4353 vmcnt(0) & vscnt(0)
4354
4355 - If CU wavefront execution mode, omit vmcnt and
4356 vscnt.
4357 - If OpenCL and - If OpenCL and
4358 address space is address space is
4359 not generic, omit. not generic, omit
4360 lgkmcnt(0).
4361 - If OpenCL and
4362 address space is
4363 local, omit
4364 vmcnt(0) and vscnt(0).
4365 - However, since LLVM - However, since LLVM
4366 currently has no currently has no
4367 address space on address space on
4368 the fence need to the fence need to
4369 conservatively conservatively
4370 always generate. If always generate. If
4371 fence had an fence had an
4372 address space then address space then
4373 set to address set to address
4374 space of OpenCL space of OpenCL
4375 fence flag, or to fence flag, or to
4376 generic if both generic if both
4377 local and global local and global
4378 flags are flags are
4379 specified. specified.
36774380 - Must happen after
36784381 any preceding
36794382 local/generic
36804383 load/load
36814384 atomic/store/store
36824385 atomic/atomicrmw.
3683 - Must happen before
3684 any following store
3685 atomic/atomicrmw
3686 with an equal or
3687 wider sync scope
3688 and memory ordering
3689 stronger than
3690 unordered (this is
3691 termed the
3692 fence-paired-atomic).
3693 - Ensures that all
3694 memory operations
3695 to local have
3696 completed before
3697 performing the
3698 following
3699 fence-paired-atomic.
3700
3701 fence release - agent *none* 1. s_waitcnt lgkmcnt(0) &
3702 - system vmcnt(0)
3703
3704 - If OpenCL and
3705 address space is
3706 not generic, omit
3707 lgkmcnt(0).
3708 - If OpenCL and
3709 address space is
3710 local, omit
3711 vmcnt(0).
3712 - However, since LLVM
3713 currently has no
3714 address space on
3715 the fence need to
3716 conservatively
3717 always generate. If
3718 fence had an
3719 address space then
3720 set to address
3721 space of OpenCL
3722 fence flag, or to
3723 generic if both
3724 local and global
3725 flags are
3726 specified.
3727 - Could be split into
3728 separate s_waitcnt
3729 vmcnt(0) and
3730 s_waitcnt
3731 lgkmcnt(0) to allow
3732 them to be
3733 independently moved
3734 according to the
3735 following rules.
3736 - s_waitcnt vmcnt(0)
3737 must happen after
3738 any preceding
3739 global/generic
3740 load/store/load
3741 atomic/store
4386 - Could be split into
4387 separate s_waitcnt
4388 vmcnt(0), s_waitcnt
4389 vscnt(0) and s_waitcnt
4390 lgkmcnt(0) to allow
4391 them to be
4392 independently moved
4393 according to the
4394 following rules.
4395 - s_waitcnt vmcnt(0)
4396 must happen after
4397 any preceding
4398 global/generic
4399 load/load
4400 atomic/
4401 atomicrmw-with-return-value.
4402 - s_waitcnt vscnt(0)
4403 must happen after
4404 any preceding
4405 global/generic
4406 store/store atomic/
4407 atomicrmw-no-return-value.
4408 - s_waitcnt lgkmcnt(0)
4409 must happen after
4410 any preceding
4411 local/generic
4412 load/store/load
4413 atomic/store atomic/
4414 atomicrmw.
4415 - Must happen before - Must happen before
4416 any following store any following store
4417 atomic/atomicrmw atomic/atomicrmw
4418 with an equal or with an equal or
4419 wider sync scope wider sync scope
4420 and memory ordering and memory ordering
4421 stronger than stronger than
4422 unordered (this is unordered (this is
4423 termed the termed the
4424 fence-paired-atomic). fence-paired-atomic).
4425 - Ensures that all - Ensures that all
4426 memory operations memory operations
4427 to local have have
4428 completed before completed before
4429 performing the performing the
4430 following following
4431 fence-paired-atomic. fence-paired-atomic.
4432
4433 fence release - agent *none* 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lgkmcnt(0) &
4434 - system vmcnt(0) vmcnt(0) & vscnt(0)
4435
4436 - If OpenCL and - If OpenCL and
4437 address space is address space is
4438 not generic, omit not generic, omit
4439 lgkmcnt(0). lgkmcnt(0).
4440 - If OpenCL and - If OpenCL and
4441 address space is address space is
4442 local, omit local, omit
4443 vmcnt(0). vmcnt(0) and vscnt(0).
4444 - However, since LLVM - However, since LLVM
4445 currently has no currently has no
4446 address space on address space on
4447 the fence need to the fence need to
4448 conservatively conservatively
4449 always generate. If always generate. If
4450 fence had an fence had an
4451 address space then address space then
4452 set to address set to address
4453 space of OpenCL space of OpenCL
4454 fence flag, or to fence flag, or to
4455 generic if both generic if both
4456 local and global local and global
4457 flags are flags are
4458 specified. specified.
4459 - Could be split into - Could be split into
4460 separate s_waitcnt separate s_waitcnt
4461 vmcnt(0) and vmcnt(0), s_waitcnt
4462 s_waitcnt vscnt(0) and s_waitcnt
4463 lgkmcnt(0) to allow lgkmcnt(0) to allow
4464 them to be them to be
4465 independently moved independently moved
4466 according to the according to the
4467 following rules. following rules.
4468 - s_waitcnt vmcnt(0) - s_waitcnt vmcnt(0)
4469 must happen after must happen after
4470 any preceding any preceding
4471 global/generic global/generic
4472 load/store/load load/load atomic/
4473 atomic/store atomicrmw-with-return-value.
37424474 atomic/atomicrmw.
3743 - s_waitcnt lgkmcnt(0)
3744 must happen after
3745 any preceding
3746 local/generic
3747 load/store/load
3748 atomic/store
3749 atomic/atomicrmw.
3750 - Must happen before
3751 any following store
3752 atomic/atomicrmw
3753 with an equal or
3754 wider sync scope
3755 and memory ordering
3756 stronger than
3757 unordered (this is
3758 termed the
3759 fence-paired-atomic).
3760 - Ensures that all
3761 memory operations
3762 have
3763 completed before
3764 performing the
3765 following
3766 fence-paired-atomic.
4475 - s_waitcnt vscnt(0)
4476 must happen after
4477 any preceding
4478 global/generic
4479 store/store atomic/
4480 atomicrmw-no-return-value.
4481 - s_waitcnt lgkmcnt(0) - s_waitcnt lgkmcnt(0)
4482 must happen after must happen after
4483 any preceding any preceding
4484 local/generic local/generic
4485 load/store/load load/store/load
4486 atomic/store atomic/store
4487 atomic/atomicrmw. atomic/atomicrmw.
4488 - Must happen before - Must happen before
4489 any following store any following store
4490 atomic/atomicrmw atomic/atomicrmw
4491 with an equal or with an equal or
4492 wider sync scope wider sync scope
4493 and memory ordering and memory ordering
4494 stronger than stronger than
4495 unordered (this is unordered (this is
4496 termed the termed the
4497 fence-paired-atomic). fence-paired-atomic).
4498 - Ensures that all - Ensures that all
4499 memory operations memory operations
4500 have have
4501 completed before completed before
4502 performing the performing the
4503 following following
4504 fence-paired-atomic. fence-paired-atomic.
37674505
37684506 **Acquire-Release Atomic**
3769 -----------------------------------------------------------------------------------
3770 atomicrmw acq_rel - singlethread - global 1. buffer/global/ds/flat_atomic
4507 ----------------------------------------------------------------------------------------------------------------------
4508 atomicrmw acq_rel - singlethread - global 1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
37714509 - wavefront - local
37724510 - generic
3773 atomicrmw acq_rel - workgroup - global 1. s_waitcnt lgkmcnt(0)
3774
3775 - If OpenCL, omit.
4511 atomicrmw acq_rel - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4512 vmcnt(0) & vscnt(0)
4513
4514 - If CU wavefront execution mode, omit vmcnt and
4515 vscnt.
4516 - If OpenCL, omit. - If OpenCL, omit
4517 s_waitcnt lgkmcnt(0).
4518 - Must happen after - Must happen after
4519 any preceding any preceding
4520 local/generic local/generic
4521 load/store/load load/store/load
4522 atomic/store atomic/store
4523 atomic/atomicrmw. atomic/atomicrmw.
4524 - Could be split into
4525 separate s_waitcnt
4526 vmcnt(0), s_waitcnt
4527 vscnt(0) and s_waitcnt
4528 lgkmcnt(0) to allow
4529 them to be
4530 independently moved
4531 according to the
4532 following rules.
4533 - s_waitcnt vmcnt(0)
4534 must happen after
4535 any preceding
4536 global/generic load/load
4537 atomic/
4538 atomicrmw-with-return-value.
4539 - s_waitcnt vscnt(0)
4540 must happen after
4541 any preceding
4542 global/generic
4543 store/store
4544 atomic/
4545 atomicrmw-no-return-value.
4546 - s_waitcnt lgkmcnt(0)
4547 must happen after
4548 any preceding
4549 local/generic load/store/load
4550 atomic/store atomic/atomicrmw.
4551 - Must happen before - Must happen before
4552 the following the following
4553 atomicrmw. atomicrmw.
4554 - Ensures that all - Ensures that all
4555 memory operations memory operations
4556 to local have have
4557 completed before completed before
4558 performing the performing the
4559 atomicrmw that is atomicrmw that is
4560 being released. being released.
4561
4562 2. buffer/global/flat_atomic 2. buffer/global_atomic
4563 3. s_waitcnt vm/vscnt(0)
4564
4565 - If CU wavefront execution mode, omit vm/vscnt.
4566 - Use vmcnt if atomic with
4567 return and vscnt if atomic
4568 with no-return.
4569 waitcnt lgkmcnt(0).
4570 - Must happen before
4571 the following
4572 buffer_gl0_inv.
4573 - Ensures any
4574 following global
4575 data read is no
4576 older than the
4577 atomicrmw value
4578 being acquired.
4579
4580 4. buffer_gl0_inv
4581
4582 - If CU wavefront execution mode, omit.
4583 - Ensures that
4584 following
4585 loads will not see
4586 stale data.
4587
4588 atomicrmw acq_rel - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
4589
4590 - If CU wavefront execution mode, omit.
4591 - If OpenCL, omit.
4592 - Could be split into
4593 separate s_waitcnt
4594 vmcnt(0) and s_waitcnt
4595 vscnt(0) to allow
4596 them to be
4597 independently moved
4598 according to the
4599 following rules.
4600 - s_waitcnt vmcnt(0)
4601 must happen after
4602 any preceding
4603 global/generic load/load
4604 atomic/
4605 atomicrmw-with-return-value.
4606 - s_waitcnt vscnt(0)
4607 must happen after
4608 any preceding
4609 global/generic
4610 store/store atomic/
4611 atomicrmw-no-return-value.
4612 - Must happen before
4613 the following
4614 store.
4615 - Ensures that all
4616 global memory
4617 operations have
4618 completed before
4619 performing the
4620 store that is being
4621 released.
4622
4623 1. ds_atomic 2. ds_atomic
4624 2. s_waitcnt lgkmcnt(0) 3. s_waitcnt lgkmcnt(0)
4625
4626 - If OpenCL, omit. - If OpenCL, omit.
4627 - Must happen before - Must happen before
4628 any following the following
4629 global/generic buffer_gl0_inv.
4630 load/load
4631 atomic/store/store
4632 atomic/atomicrmw.
4633 - Ensures any - Ensures any
4634 following global following global
4635 data read is no data read is no
4636 older than the load older than the load
4637 atomic value being atomic value being
4638 acquired. acquired.
4639
4640 4. buffer_gl0_inv
4641
4642 - If CU wavefront execution mode, omit.
4643 - If OpenCL omit.
4644 - Ensures that
4645 following
4646 loads will not see
4647 stale data.
4648
4649 atomicrmw acq_rel - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4650 vmcnt(0) & vscnt(0)
4651
4652 - If CU wavefront execution mode, omit vmcnt and
4653 vscnt.
4654 - If OpenCL, omit. - If OpenCL, omit
4655 waitcnt lgkmcnt(0).
37764656 - Must happen after
37774657 any preceding
37784658 local/generic
37794659 load/store/load
37804660 atomic/store
37814661 atomic/atomicrmw.
3782 - Must happen before
3783 the following
3784 atomicrmw.
3785 - Ensures that all
3786 memory operations
3787 to local have
3788 completed before
3789 performing the
3790 atomicrmw that is
3791 being released.
3792
3793 2. buffer/global/flat_atomic
3794 atomicrmw acq_rel - workgroup - local 1. ds_atomic
3795 2. s_waitcnt lgkmcnt(0)
3796
3797 - If OpenCL, omit.
3798 - Must happen before
3799 any following
3800 global/generic
4662 - Could be split into
4663 separate s_waitcnt
4664 vmcnt(0), s_waitcnt
4665 vscnt(0) and s_waitcnt
4666 lgkmcnt(0) to allow
4667 them to be
4668 independently moved
4669 according to the
4670 following rules.
4671 - s_waitcnt vmcnt(0)
4672 must happen after
4673 any preceding
4674 global/generic load/load
4675 atomic/
4676 atomicrmw-with-return-value.
4677 - s_waitcnt vscnt(0)
4678 must happen after
4679 any preceding
4680 global/generic
4681 store/store
4682 atomic/
4683 atomicrmw-no-return-value.
4684 - s_waitcnt lgkmcnt(0)
4685 must happen after
4686 any preceding
4687 local/generic load/store/load
4688 atomic/store atomic/atomicrmw.
4689 - Must happen before - Must happen before
4690 the following the following
4691 atomicrmw. atomicrmw.
4692 - Ensures that all - Ensures that all
4693 memory operations memory operations
4694 to local have have
4695 completed before completed before
4696 performing the performing the
4697 atomicrmw that is atomicrmw that is
4698 being released. being released.
4699
4700 2. flat_atomic 2. flat_atomic
4701 3. s_waitcnt lgkmcnt(0) 3. s_waitcnt lgkmcnt(0) &
4702 vm/vscnt(0)
4703
4704 - If CU wavefront execution mode, omit vm/vscnt.
4705 - If OpenCL, omit. - If OpenCL, omit
4706 waitcnt lgkmcnt(0).
4707 - Must happen before - Must happen before
4708 any following the following
4709 global/generic buffer_gl0_inv.
38014710 load/load
38024711 atomic/store/store
38034712 atomic/atomicrmw.
3804 - Ensures any
3805 following global
3806 data read is no
3807 older than the load
3808 atomic value being
3809 acquired.
3810
3811 atomicrmw acq_rel - workgroup - generic 1. s_waitcnt lgkmcnt(0)
3812
3813 - If OpenCL, omit.
3814 - Must happen after
3815 any preceding
3816 local/generic
3817 load/store/load
3818 atomic/store
4713 - Ensures any - Ensures any
4714 following global following global
4715 data read is no data read is no
4716 older than the load older than the load
4717 atomic value being atomic value being
4718 acquired. acquired.
4719
4720 3. buffer_gl0_inv
4721
4722 - If CU wavefront execution mode, omit.
4723 - Ensures that
4724 following
4725 loads will not see
4726 stale data.
4727
4728 atomicrmw acq_rel - agent - global 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lgkmcnt(0) &
4729 - system vmcnt(0) vmcnt(0) & vscnt(0)
4730
4731 - If OpenCL, omit - If OpenCL, omit
4732 lgkmcnt(0). lgkmcnt(0).
4733 - Could be split into - Could be split into
4734 separate s_waitcnt separate s_waitcnt
4735 vmcnt(0) and vmcnt(0), s_waitcnt
4736 s_waitcnt vscnt(0) and s_waitcnt
4737 lgkmcnt(0) to allow lgkmcnt(0) to allow
4738 them to be them to be
4739 independently moved independently moved
4740 according to the according to the
4741 following rules. following rules.
4742 - s_waitcnt vmcnt(0) - s_waitcnt vmcnt(0)
4743 must happen after must happen after
4744 any preceding any preceding
4745 global/generic global/generic
4746 load/store/load load/load atomic/
4747 atomic/store atomicrmw-with-return-value.
38194748 atomic/atomicrmw.
3820 - Must happen before
3821 the following
3822 atomicrmw.
3823 - Ensures that all
3824 memory operations
3825 to local have
3826 completed before
3827 performing the
3828 atomicrmw that is
3829 being released.
3830
3831 2. flat_atomic
3832 3. s_waitcnt lgkmcnt(0)
3833
3834 - If OpenCL, omit.
3835 - Must happen before
3836 any following
3837 global/generic
3838 load/load
3839 atomic/store/store
4749 - s_waitcnt vscnt(0)
4750 must happen after
4751 any preceding
4752 global/generic
4753 store/store atomic/
4754 atomicrmw-no-return-value.
4755 - s_waitcnt lgkmcnt(0) - s_waitcnt lgkmcnt(0)
4756 must happen after must happen after
4757 any preceding any preceding
4758 local/generic local/generic
4759 load/store/load load/store/load
4760 atomic/store atomic/store
4761 atomic/atomicrmw. atomic/atomicrmw.
4762 - Must happen before - Must happen before
4763 the following the following
4764 atomicrmw. atomicrmw.
4765 - Ensures that all - Ensures that all
4766 memory operations memory operations
4767 to global have to global have
4768 completed before completed before
4769 performing the performing the
4770 atomicrmw that is atomicrmw that is
4771 being released. being released.
4772
4773 2. buffer/global/flat_atomic 2. buffer/global_atomic
4774 3. s_waitcnt vmcnt(0) 3. s_waitcnt vm/vscnt(0)
4775
4776 - Use vmcnt if atomic with
4777 return and vscnt if atomic
4778 with no-return.
4779 waitcnt lgkmcnt(0).
4780 - Must happen before - Must happen before
4781 following following
4782 buffer_wbinvl1_vol. buffer_gl*_inv.
4783 - Ensures the - Ensures the
4784 atomicrmw has atomicrmw has
4785 completed before completed before
4786 invalidating the invalidating the
4787 cache. caches.
4788
4789 4. buffer_wbinvl1_vol 4. buffer_gl0_inv;
4790 buffer_gl1_inv
4791
4792 - Must happen before - Must happen before
4793 any following any following
4794 global/generic global/generic
4795 load/load load/load
4796 atomic/atomicrmw. atomic/atomicrmw.
4797 - Ensures that - Ensures that
4798 following loads following loads
4799 will not see stale will not see stale
4800 global data. global data.
4801
4802 atomicrmw acq_rel - agent - generic 1. s_waitcnt lgkmcnt(0) & 1. s_waitcnt lgkmcnt(0) &
4803 - system vmcnt(0) vmcnt(0) & vscnt(0)
4804
4805 - If OpenCL, omit - If OpenCL, omit
4806 lgkmcnt(0). lgkmcnt(0).
4807 - Could be split into - Could be split into
4808 separate s_waitcnt separate s_waitcnt
4809 vmcnt(0) and vmcnt(0), s_waitcnt
4810 s_waitcnt vscnt(0) and s_waitcnt
4811 lgkmcnt(0) to allow lgkmcnt(0) to allow
4812 them to be them to be
4813 independently moved independently moved
4814 according to the according to the
4815 following rules. following rules.
4816 - s_waitcnt vmcnt(0) - s_waitcnt vmcnt(0)
4817 must happen after must happen after
4818 any preceding any preceding
4819 global/generic global/generic
4820 load/store/load load/load atomic
4821 atomic/store atomicrmw-with-return-value.
38404822 atomic/atomicrmw.
3841 - Ensures any
3842 following global
3843 data read is no
3844 older than the load
3845 atomic value being
3846 acquired.
3847
3848 atomicrmw acq_rel - agent - global 1. s_waitcnt lgkmcnt(0) &
3849 - system vmcnt(0)
3850
3851 - If OpenCL, omit
3852 lgkmcnt(0).
3853 - Could be split into
3854 separate s_waitcnt
3855 vmcnt(0) and
3856 s_waitcnt
3857 lgkmcnt(0) to allow
3858 them to be
3859 independently moved
3860 according to the
3861 following rules.
3862 - s_waitcnt vmcnt(0)
3863 must happen after
3864 any preceding
3865 global/generic
3866 load/store/load
3867 atomic/store
3868 atomic/atomicrmw.
3869 - s_waitcnt lgkmcnt(0)
3870 must happen after
3871 any preceding
3872 local/generic
3873 load/store/load
3874 atomic/store
3875 atomic/atomicrmw.
3876 - Must happen before
3877 the following
3878 atomicrmw.
3879 - Ensures that all
3880 memory operations
3881 to global have
3882 completed before
3883 performing the
3884 atomicrmw that is
3885 being released.
3886
3887 2. buffer/global/flat_atomic
3888 3. s_waitcnt vmcnt(0)
3889
3890 - Must happen before
3891 following
3892 buffer_wbinvl1_vol.
3893 - Ensures the
3894 atomicrmw has
3895 completed before
3896 invalidating the
3897 cache.
3898
3899 4. buffer_wbinvl1_vol
3900
3901 - Must happen before
3902 any following
3903 global/generic
3904 load/load
3905 atomic/atomicrmw.
3906 - Ensures that
3907 following loads
3908 will not see stale
3909 global data.
3910
3911 atomicrmw acq_rel - agent - generic 1. s_waitcnt lgkmcnt(0) &
3912 - system vmcnt(0)
3913
3914 - If OpenCL, omit
3915 lgkmcnt(0).
3916 - Could be split into
3917 separate s_waitcnt
3918 vmcnt(0) and
3919 s_waitcnt
3920 lgkmcnt(0) to allow
3921 them to be
3922 independently moved
3923 according to the
3924 following rules.
3925 - s_waitcnt vmcnt(0)
3926 must happen after
3927 any preceding
3928 global/generic
3929 load/store/load
3930 atomic/store
3931 atomic/atomicrmw.
3932 - s_waitcnt lgkmcnt(0)
3933 must happen after
3934 any preceding
3935 local/generic
3936 load/store/load
3937 atomic/store
3938 atomic/atomicrmw.
3939 - Must happen before
3940 the following
3941 atomicrmw.
3942 - Ensures that all
3943 memory operations
3944 to global have
3945 completed before
3946 performing the
3947 atomicrmw that is
3948 being released.
3949
3950 2. flat_atomic
3951 3. s_waitcnt vmcnt(0) &
3952 lgkmcnt(0)
3953
3954 - If OpenCL, omit
3955 lgkmcnt(0).
3956 - Must happen before
3957 following
3958 buffer_wbinvl1_vol.
3959 - Ensures the
3960 atomicrmw has
3961 completed before
3962 invalidating the
3963 cache.
3964
3965 4. buffer_wbinvl1_vol
3966
3967 - Must happen before
3968 any following
3969 global/generic
3970 load/load
3971 atomic/atomicrmw.
3972 - Ensures that
3973 following loads
3974 will not see stale
3975 global data.
3976
3977 fence acq_rel - singlethread *none* *none*
4823 - s_waitcnt vscnt(0)
4824 must happen after
4825 any preceding
4826 global/generic
4827 store/store atomic/
4828 atomicrmw-no-return-value.
4829 - s_waitcnt lgkmcnt(0) - s_waitcnt lgkmcnt(0)
4830 must happen after must happen after
4831 any preceding any preceding
4832 local/generic local/generic
4833 load/store/load load/store/load
4834 atomic/store atomic/store
4835 atomic/atomicrmw. atomic/atomicrmw.
4836 - Must happen before - Must happen before
4837 the following the following
4838 atomicrmw. atomicrmw.
4839 - Ensures that all - Ensures that all
4840 memory operations memory operations
4841 to global have have
4842 completed before completed before
4843 performing the performing the
4844 atomicrmw that is atomicrmw that is
4845 being released. being released.
4846
4847 2. flat_atomic 2. flat_atomic
4848 3. s_waitcnt vmcnt(0) & 3. s_waitcnt vm/vscnt(0) &
4849 lgkmcnt(0) lgkmcnt(0)
4850
4851 - If OpenCL, omit - If OpenCL, omit
4852 lgkmcnt(0). lgkmcnt(0).
4853 - Use vmcnt if atomic with
4854 return and vscnt if atomic
4855 with no-return.
4856 - Must happen before - Must happen before
4857 following following
4858 buffer_wbinvl1_vol. buffer_gl*_inv.
4859 - Ensures the - Ensures the
4860 atomicrmw has atomicrmw has
4861 completed before completed before
4862 invalidating the invalidating the
4863 cache. caches.
4864
4865 4. buffer_wbinvl1_vol 4. buffer_gl0_inv;
4866 buffer_gl1_inv
4867
4868 - Must happen before - Must happen before
4869 any following any following
4870 global/generic global/generic
4871 load/load load/load
4872 atomic/atomicrmw. atomic/atomicrmw.
4873 - Ensures that - Ensures that
4874 following loads following loads
4875 will not see stale will not see stale
4876 global data. global data.
4877
4878 fence acq_rel - singlethread *none* *none* *none*
39784879 - wavefront
3979 fence acq_rel - workgroup *none* 1. s_waitcnt lgkmcnt(0)
3980
3981 - If OpenCL and
3982 address space is
3983 not generic, omit.
3984 - However,
3985 since LLVM
3986 currently has no
3987 address space on
3988 the fence need to
3989 conservatively
3990 always generate
3991 (see comment for
3992 previous fence).
4880 fence acq_rel - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
4881 vmcnt(0) & vscnt(0)
4882
4883 - If CU wavefront execution mode, omit vmcnt and
4884 vscnt.
4885 - If OpenCL and - If OpenCL and
4886 address space is address space is
4887 not generic, omit. not generic, omit
4888 lgkmcnt(0).
4889 - If OpenCL and
4890 address space is
4891 local, omit
4892