Public Git Hosting - llvm-project.git/commit

commit	06f136f61e6d23fde5c91f7fa0813d0291c17c97
author	Philip Reames <listmail@philipreames.com>
	Fri, 18 Sep 2020 21:53:29 +0000 (18 14:53 -0700)
committer	Philip Reames <listmail@philipreames.com>
	Fri, 18 Sep 2020 21:54:24 +0000 (18 14:54 -0700)
tree	80833d0ba0a6fbeea835ede9da8f2d9f42916cab	tree \| snapshot (tar.gz zip)
parent	7c10129f5a2145cf8f6dbe259269fd2a781a8dbe	commit \| diff

[instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic

If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.

The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.

Differential Revision: https://reviews.llvm.org/D87861

llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp		diff \| blob \| blame \| history
llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll		diff \| blob \| blame \| history