Violet
dc69808e54
Add support for shfl.sync.MODE.b32
( #409 )
2025-07-16 17:23:11 -07:00
Violet
5cb0a9b8e8
Add support for bar.red.and.pred
( #402 )
...
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Andrzej Janik
7ac67a89e9
Enable Geekbench 5 ( #304 )
2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf
Fix host code and update to CUDA 12.4 ( #299 )
2024-12-02 00:29:57 +01:00
Andrzej Janik
3870a96592
Re-enable all failing PTX tests ( #277 )
...
Additionally remove unused compilation paths
2024-10-16 03:15:48 +02:00
Andrzej Janik
04a411fe22
Have an implementation for vprintf
2021-09-18 20:22:47 +00:00
Andrzej Janik
d5a4b068dd
Redo handling of sregs
2021-09-17 20:53:44 +00:00
Andrzej Janik
6ef19d6501
Add early support for more sregs
2021-09-17 18:31:12 +00:00
Andrzej Janik
5b2352723f
Implement function pointers and activemask
2021-09-17 16:24:25 +00:00
Andrzej Janik
18245be7d5
Make ptx unit tests run on AMD (except denormals)
2021-09-07 23:24:49 +00:00
Andrzej Janik
638786b0ec
Hack enough functionality that AMD GPU code builds
2021-08-03 00:22:47 +02:00