[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385)
commit9bccf61f5fd20a52f997b23a56c13ada72c46eae
authorSjoerd Meijer <smeijer@nvidia.com>
Wed, 20 Nov 2024 09:33:39 +0000 (20 09:33 +0000)
committerGitHub <noreply@github.com>
Wed, 20 Nov 2024 09:33:39 +0000 (20 09:33 +0000)
tree1cab81354bf7e87bb6689caf0bba203ae1bcff98
parent2b5214b9e16cdc784def1d521ce38074a2e8c90f
[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385)

Set the maximum interleaving factor to 4, aligning with the number of available
SIMD pipelines. This increases the number of vector instructions in the vectorised
loop body, enhancing performance during its execution. However, for very low
iteration counts, the vectorised body might not execute at all, leaving only the
epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which
experienced a performance regression. To address this, the patch reduces the
minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be
vectorised and largely mitigating the regression.
13 files changed:
llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
llvm/include/llvm/CodeGen/BasicTTIImpl.h
llvm/lib/Analysis/TargetTransformInfo.cpp
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
llvm/lib/Target/AArch64/AArch64Subtarget.h
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll
llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll
llvm/test/Transforms/LoopVectorize/AArch64/neoverse-epilogue-vect.ll [new file with mode: 0644]
llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-vscale-tune.ll