382 Commits

Author SHA1 Message Date
Violet
06b28cfec7 More descriptive message for unknown symbol (#411) v5-preview.45 2025-07-14 15:01:38 -07:00
Violet
373d6d9e6e Remove duplicate call to linker (#410) v5-preview.44 2025-07-10 12:44:22 -07:00
Andrzej Janik
081f7d0976 Enable sccache in Rust builds, publish prerelease builds (#408) v5-preview.43 2025-07-09 09:20:03 -07:00
Violet
6e27f78ae7 Add support for multiple return arguments (#406) 2025-07-09 08:17:15 -07:00
Violet
fa7ecb2e02 Update README.md (#407) 2025-07-08 17:13:53 -07:00
Andrzej Janik
059b8ca0f6 Make sure it is possible to log 32bit PhysX (#374) 2025-07-08 10:19:49 -07:00
aiwhskruht
9bd8125c53 Implement more CUDA driver API to enable simple cuda-samples (#405) 2025-07-08 10:18:30 -07:00
aiwhskruht
8d5b734c30 Add initialized check to protect zluda from cuda driver calls during shutdown (#404) 2025-07-07 11:08:09 -07:00
Andrzej Janik
ef0c4afcf9 Run unit tests on every commit (#401) 2025-07-03 16:07:00 -07:00
Violet
5cb0a9b8e8 Add support for bar.red.and.pred (#402)
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Violet
7bdd20f0dd Add warp-wide tests (#400) 2025-07-02 18:11:36 -07:00
Andrzej Janik
6d56fa8c34 Fix floating point min/max (#399) 2025-07-01 15:58:16 -07:00
Violet
b824424367 Read test files at runtime for development ergonomics (#395) 2025-07-01 10:31:06 -07:00
Violet
1cf345329c Make derive_parser work with all optional arguments (#397)
The current implementation using `winnow`'s `opt` does not work for optional arguments that are in the middle of the command. For example, `bar{.cta}.red.op.pred   p, a{, b}, {!}c;`. This is because `opt` is greedy, and will always match `{, b}` instead of `,{!} c`. This change switches to using a custom combinator that handles this properly
2025-06-30 18:54:31 -07:00
aiwhskruht
d4ad17d75a Unified fatbin versions behind a single iterator. (#398) 2025-06-27 15:56:46 -07:00
Violet
80607c07db Check LLVM IR for test_ptx! with no input/output (#394) 2025-06-24 11:53:30 -07:00
Andrzej Janik
22608d7420 Bump dependencies (#392)
zip 2.6.1 was yanked and microlp 2.10 has a major bug
2025-06-23 18:04:08 -07:00
Violet
5edfeb04eb Error instead of infinite loop when parsing enum without a derive attribute in derive_parser! (#391) 2025-06-23 16:18:21 -07:00
Violet
74ff9ebf96 Remove trailing zeroes from end of ptx (#390) 2025-06-23 16:14:07 -07:00
Violet
f4cd545677 Fix bug in get_payload (#389) 2025-06-18 17:29:21 -07:00
Violet
4da3978f94 Implement cuLibraryLoadData (#388) 2025-06-18 16:05:53 -07:00
Violet
8ce70c5095 Add integrity_check implementation to ZLUDA (#387) 2025-06-17 15:00:10 -07:00
Andrzej Janik
2a374ad880 Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests (#379) 2025-06-16 19:14:16 -07:00
Violet
4d4053194a Implement runtime_callback_hooks_fn6 (#386) 2025-06-16 17:00:47 -07:00
Violet
9c5f1ed9fb Handle new attributes in cuDeviceGetAttribute (#383) 2025-06-16 13:20:04 -07:00
Andrzej Janik
f179868b8e Add automated builds (#358) 2025-06-16 09:53:18 -07:00
Violet
9773d20945 Implement cudart_interface_fn2 (#382) 2025-06-13 14:01:14 -07:00
Violet
1715830d82 Implement cuModuleGetLoadingMode (#381) 2025-06-11 15:54:48 -07:00
Violet
25a9d1c40e Implement runtime_callback_hooks_fn2 (#380) 2025-06-11 15:15:43 -07:00
Violet
62f3e63355 Implement cuGetProcAddress and cuGetProcAddress_v2 (#377) 2025-06-10 16:07:35 -07:00
Andrzej Janik
3361046760 Fix mad.wide, replace external CUDA library in test with our own (#376) 2025-06-09 21:33:18 -07:00
Andrzej Janik
c790ab45ec Redo logging to better log dark API and performance libraries (#372) 2025-06-09 15:29:14 -07:00
Andrzej Janik
5935cfec78 Work around broken AMD Adrenalin 25.5.1 driver (#366)
For reasons unknown AMD Adrenalin 25.5.1 ships with comgr that presents itself as version 2, but expects ABI for veersion 3. Add a workaround
2025-05-13 02:20:23 +02:00
Andrzej Janik
3d3e38aadc Fix ROCm 6.4 failures (#364)
Lazy load comgr and dispatch to different code paths based on the name of the comgr .dll/.so
2025-05-02 00:38:22 +02:00
Andrzej Janik
cc83b9f1f6 Create infrastructure for performance libraries (#363) 2025-05-01 22:37:18 +02:00
Andrzej Janik
adc4673a20 Explicitly fail compilation on ROCm 6.4 (#361)
AMD broke comgr ABI in 6.4. This is a temporary solution.
2025-04-20 17:02:05 +02:00
Joëlle van Essen
7cdab7abc2 Implement mul24 (#351) 2025-04-08 12:27:19 +02:00
Andrzej Janik
d704e92c97 Support instruction modes (denormal and rounding) on AMD GPUs (#342) 2025-03-17 21:37:26 +01:00
Joëlle van Essen
867e4728d5 LLVM unit tests (#324)
* LLVM unit tests: add assembly files

* LLVM unit tests: first attempt

* LLVM unit tests: fix - parse bitcode in context

* LLVM unit tests: use pretty_assertions for line-by-line diff

* LLVM unit tests: Write IR to file for failed test

* LLVM unit tests: just use the stack

* LLVM unit tests: use MaybeUninit

* LLVM unit tests: add mul24.ll

* LLVM unit tests: Adjustments after review

* LLVM unit tests: Include emit_llvm::Context in emit_llvm::Module

* LLVM unit tests: Fix typo

* LLVM unit tests: Context need not be pub
2025-02-19 21:21:20 +01:00
Andrzej Janik
646d746e02 Start working on mul24 2025-02-07 19:37:11 +00:00
Andrzej Janik
df5a96d935 Improve build system (#329)
Also fix Dockerfile and Windows build
2025-01-28 01:55:36 +01:00
Alexander Zaitsev
9c0747a5f7 fix: missing inherits in a release-lto profile (#319) 2025-01-03 16:58:19 +01:00
Alexander Zaitsev
fee20e54d9 feat: enable LTO and codegen-units = 1 optimization (#318) 2025-01-02 19:07:39 +01:00
Joëlle van Essen
7399132d5d Fix test in zluda_dump (#316) 2025-01-01 23:02:59 +01:00
Andrzej Janik
ecd61a8e2a Update README for version 4 (#315) 2024-12-31 17:33:59 +01:00
Joëlle van Essen
de870db1f1 Fix build error (#314) v4 2024-12-20 18:33:05 +01:00
Andrzej Janik
7ac67a89e9 Enable Geekbench 5 (#304) 2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf Fix host code and update to CUDA 12.4 (#299) 2024-12-02 00:29:57 +01:00
Rayyan Ul Haq
870fed4bb6 Update README.md (#300) 2024-11-25 00:45:09 +01:00
Andrzej Janik
970ba5aa25 Fix linking of AMD device libraries (#296)
It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm
2024-11-02 16:07:44 +01:00