5 Commits

Author SHA1 Message Date
Violet
5cb0a9b8e8 Add support for bar.red.and.pred (#402)
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Andrzej Janik
7ac67a89e9 Enable Geekbench 5 (#304) 2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf Fix host code and update to CUDA 12.4 (#299) 2024-12-02 00:29:57 +01:00
Andrzej Janik
970ba5aa25 Fix linking of AMD device libraries (#296)
It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm
2024-11-02 16:07:44 +01:00
Andrzej Janik
3870a96592 Re-enable all failing PTX tests (#277)
Additionally remove unused compilation paths
2024-10-16 03:15:48 +02:00