llvm.org GIT mirror llvm / c5ac61d
continue writing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128990 91177308-0d34-0410-b5e6-96231b3b80d8 Chris Lattner 9 years ago
1 changed file(s) with 110 addition(s) and 103 deletion(s). Raw diff Collapse all Expand all
398398
399399
400400
401
  • 402 TBAA: On by default in clang. Disable it with -fno-strict-aliasing.
    403 Could be more aggressive for structs.
    401
  • Type Based Alias Analysis (TBAA) is now implemented and turned on by default
  • 402 in Clang. This allows substantially better load/store optimization in some
    403 cases. TBAA can be disabled by passing -fno-strict-aliasing.
    404404
    405405
    406
  • New Nvidia PTX backend, not generally useful in 2.9 though.
  • 407
    408
  • 409 Much better debug info generated, particularly in optimized code situations.
    410
    411
    412
  • 413 inline asm multiple alternative constraint support.
    414
    415
    416
  • 417 New naming rules in coding standards: CodingStandards.html#ll_naming
    418
    419
    406
  • This release has seen a continued focus on quality of debug information.
  • 407 LLVM now generates much higher fidelity debug information, particularly when
    408 debugging optimized code.
    409
    410
  • Inline assembly now supports multiple alternative constraints.
  • 411
    412
  • A new backend for the NVIDIA PTX virtual ISA (used to target its GPUs) is
  • 413 under rapid development. It is not generally useful in 2.9, but is making
    414 rapid progress.
    415
    420416
    421417
    422418
    431427 expose new optimization opportunities:

    432428
    433429
    434
  • udiv, ashr, lshr, shl now have exact and nuw/nsw bits:
  • 435 PR8862 / LangRef.html
    436
    437 unnamed_addr + PR8927
    438
    439 new 'hotpatch' attribute: LangRef.html#fnattrs
    440
    430
  • The udiv, ashr, lshr, and shl
  • 431 instructions now have support exact and nuw/nsw bits to indicate that they
    432 don't overflow or shift out bits. This is useful for optimization of
    433 href="http://llvm.org/PR8862">pointer differences and other cases.
    434
    435
  • LLVM IR now supports the unnamed_addr
  • 436 attribute to indicate that constant global variables with identical
    437 initializers can be merged. This fixed an
    438 issue where LLVM would incorrectly merge two globals which were supposed
    439 to have distinct addresses.
    440
    441
  • The new hotpatch attribute has been added
  • 442 to allow runtime patching of functions.
    441443
    442444
    443445
    453455 release includes a few major enhancements and additions to the optimizers:

    454456
    455457
    456
  • LTO has been improved to use MC for parsing inline asm and now
  • 457 can build large programs like Firefox 4 on both OS X and Linux.
    458
    459
    460 LoopIdiom: memset/memcpy formation and memset_pattern on darwin. Build with
    461 -ffreestanding or -fno-builtin if your memcpy is being compiled into infinite
    462 recursion.
    463
    464 TargetLibraryInfo
    458
  • Link Time Optimization (LTO) has been improved to use MC for parsing inline
  • 459 assembly and now can build large programs like Firefox 4 on both Mac OS X and
    460 Linux.
    461
    462
  • The new -loop-idiom pass recognizes memset/memcpy loops (and memset_pattern
  • 463 on darwin), turning them into library calls, which are typically better
    464 optimized than inline code. If you are building a libc and notice that your
    465 memcpy and memset functions are compiled into infinite recursion, please build
    466 with -ffreestanding or -fno-builtin to disable this pass.
    467
    468
  • A new -early-cse pass does a fast pass over functions to fold constants,
  • 469 simplify expressions, perform simple dead store elimination, and perform
    470 common subexpression elimination. It does a good job at catching some of the
    471 trivial redundancies that exist in unoptimized code, making later passes more
    472 effective.<,/li>
    473
    474
  • A new -loop-instsimplify pass is used to clean up loop bodies in the loop
  • 475 optimizer.
    476
    477
  • The new TargetLibraryInfo interface allows mid-level optimizations to know
  • 478 whether the current target's runtime library has certain functions. For
    479 example, the optimizer can now transform integer-only printf calls to call
    480 iprintf, allowing reduced code size for embedded C libraries (e.g. newlib).
    481
    465482
    466 EarlyCSE pass.
    467 LoopInstSimplify pass.
    468
    469 New RegionPass infrastructure
    470 for region-based optimizations.
    471
    472 Can optimize printf to iprintf when no floating point is used, for embedded
    473 targets with smaller iprintf implementation.
    474
    475 Speedups to various mid-level passes:
    476 GVN is much faster on functions with deep dominator trees / lots of BBs.
    477 DomTree and DominatorFrontier are much faster to compute, and preserved by
    478 more passes (so they are computed less often)
    479 SRoA is also much faster and doesn't use DominanceFrontier.
    480
    481 DSE is more aggressive with stores of different types: e.g. a large store
    482 following a small one to the same address.
    483
    484
    485 We now optimize various idioms for overflow detection into check of the flag
    486 register on various CPUs, e.g.:
    483
  • LLVM has a new RegionPass
  • 484 infrastructure for region-based optimizations.
    485
    486
  • Several optimizer passes have been substantially sped up:
  • 487 GVN is much faster on functions with deep dominator trees and lots of basic
    488 blocks. The dominator tree and dominance frontier passes are much faster to
    489 compute, and preserved by more passes (so they are computed less often). The
    490 -scalar-repl pass is also much faster and doesn't use DominanceFrontier.
    491
    492
    493
  • The Dead Store Elimination pass is more aggressive optimizing stores of
  • 494 different types: e.g. a large store following a small one to the same address.
    495 The MemCpyOptimizer pass handles several new forms of memcpy elimination.
    496
    497
  • LLVM now optimizes various idioms for overflow detection into check of the
  • 498 flag register on various CPUs. For example, we now compile:
    499
    500
    
                      
                    
    487501 unsigned long t = a+b;
    488502 if (t < a) ...
    503
    489504 into:
    490 addq %rdi, %rbx
    491 jno LBB0_2
    492
    493
    494
    495
    496