Public Git Hosting - llvm-project.git/commit

commit	4263b2ecf8c4b9b62094a731bb92c501197531b0
author	peterbell10 <peterbell10@live.co.uk>
	Fri, 17 Nov 2023 17:51:50 +0000 (17 17:51 +0000)
committer	GitHub <noreply@github.com>
	Fri, 17 Nov 2023 17:51:50 +0000 (17 09:51 -0800)
tree	4532ed395d2d32d5fc983c898687f316d1620a4b	tree \| snapshot (tar.gz zip)
parent	bfbfd1caa4da70774547c1c298e482661822a137	commit \| diff

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672)

In openai/triton#2483 I've encountered a bug in the NVPTX codegen. Given
`load<8 x half>` followed by `fpext to <8 x float>` we get

```
ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15];
```

Which loads float16 values into float registers without any conversion
and the result is simply garbage.

This PR brings `v8f16` and `v8bf16` into line with the other vector
types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp		diff \| blob \| blame \| history
llvm/test/CodeGen/NVPTX/bf16-instructions.ll		diff \| blob \| blame \| history
llvm/test/CodeGen/NVPTX/vector-loads.ll		diff \| blob \| blame \| history