llvm.org GIT mirror llvm / 668f7ac
Fix for the following bug in AVX codegen for double-to-int conversions: . "fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode. . Currently for AVX mode for <4xdouble> and <8xdouble> the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode. . Consequently, the conversion produces incorrect numbers. The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. As .fp_to_sint. DAG node operation is used only for lowering of "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows. The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149056 91177308-0d34-0410-b5e6-96231b3b80d8 Victor Umansky 7 years ago
3 changed file(s) with 22 addition(s) and 3 deletion(s). Raw diff Collapse all Expand all
46924692 "cvtpd2dq\t{$src, $dst|$dst, $src}", []>;
46934693
46944694 def : Pat<(v4i32 (fp_to_sint (v4f64 VR256:$src))),
4695 (VCVTPD2DQYrr VR256:$src)>;
4695 (VCVTTPD2DQYrr VR256:$src)>;
46964696 def : Pat<(v4i32 (fp_to_sint (memopv4f64 addr:$src))),
4697 (VCVTPD2DQYrm addr:$src)>;
4697 (VCVTTPD2DQYrm addr:$src)>;
46984698
46994699 // Convert Packed DW Integers to Packed Double FP
47004700 let Predicates = [HasAVX] in {
1717 ret <4 x double> %b
1818 }
1919
20 ; CHECK: vcvtpd2dqy %ymm
20 ; CHECK: vcvttpd2dqy %ymm
2121 define <4 x i32> @fptosi01(<4 x double> %a) {
2222 %b = fptosi <4 x double> %a to <4 x i32>
2323 ret <4 x i32> %b
0 ; RUN: llc < %s -mtriple=i386-apple-darwin10 -mcpu=corei7-avx -mattr=+avx | FileCheck %s
1
2 ;; Check that FP_TO_SINT and FP_TO_UINT generate convert with truncate
3
4 ; CHECK: test1:
5 ; CHECK: vcvttpd2dqy
6 ; CHECK: ret
7 ; CHECK: test2:
8 ; CHECK: vcvttpd2dqy
9 ; CHECK: ret
10
11 define <4 x i8> @test1(<4 x double> %d) {
12 %c = fptoui <4 x double> %d to <4 x i8>
13 ret <4 x i8> %c
14 }
15 define <4 x i8> @test2(<4 x double> %d) {
16 %c = fptosi <4 x double> %d to <4 x i8>
17 ret <4 x i8> %c
18 }