Violet
06b28cfec7
More descriptive message for unknown symbol ( #411 )
v5-preview.45
2025-07-14 15:01:38 -07:00
Violet
373d6d9e6e
Remove duplicate call to linker ( #410 )
v5-preview.44
2025-07-10 12:44:22 -07:00
Andrzej Janik
081f7d0976
Enable sccache in Rust builds, publish prerelease builds ( #408 )
v5-preview.43
2025-07-09 09:20:03 -07:00
Violet
6e27f78ae7
Add support for multiple return arguments ( #406 )
2025-07-09 08:17:15 -07:00
Violet
fa7ecb2e02
Update README.md ( #407 )
2025-07-08 17:13:53 -07:00
Andrzej Janik
059b8ca0f6
Make sure it is possible to log 32bit PhysX ( #374 )
2025-07-08 10:19:49 -07:00
aiwhskruht
9bd8125c53
Implement more CUDA driver API to enable simple cuda-samples ( #405 )
2025-07-08 10:18:30 -07:00
aiwhskruht
8d5b734c30
Add initialized check to protect zluda from cuda driver calls during shutdown ( #404 )
2025-07-07 11:08:09 -07:00
Andrzej Janik
ef0c4afcf9
Run unit tests on every commit ( #401 )
2025-07-03 16:07:00 -07:00
Violet
5cb0a9b8e8
Add support for bar.red.and.pred
( #402 )
...
Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.
2025-07-03 11:56:20 -07:00
Violet
7bdd20f0dd
Add warp-wide tests ( #400 )
2025-07-02 18:11:36 -07:00
Andrzej Janik
6d56fa8c34
Fix floating point min/max ( #399 )
2025-07-01 15:58:16 -07:00
Violet
b824424367
Read test files at runtime for development ergonomics ( #395 )
2025-07-01 10:31:06 -07:00
Violet
1cf345329c
Make derive_parser
work with all optional arguments ( #397 )
...
The current implementation using `winnow`'s `opt` does not work for optional arguments that are in the middle of the command. For example, `bar{.cta}.red.op.pred p, a{, b}, {!}c;`. This is because `opt` is greedy, and will always match `{, b}` instead of `,{!} c`. This change switches to using a custom combinator that handles this properly
2025-06-30 18:54:31 -07:00
aiwhskruht
d4ad17d75a
Unified fatbin versions behind a single iterator. ( #398 )
2025-06-27 15:56:46 -07:00
Violet
80607c07db
Check LLVM IR for test_ptx!
with no input/output ( #394 )
2025-06-24 11:53:30 -07:00
Andrzej Janik
22608d7420
Bump dependencies ( #392 )
...
zip 2.6.1 was yanked and microlp 2.10 has a major bug
2025-06-23 18:04:08 -07:00
Violet
5edfeb04eb
Error instead of infinite loop when parsing enum without a derive attribute in derive_parser! ( #391 )
2025-06-23 16:18:21 -07:00
Violet
74ff9ebf96
Remove trailing zeroes from end of ptx ( #390 )
2025-06-23 16:14:07 -07:00
Violet
f4cd545677
Fix bug in get_payload ( #389 )
2025-06-18 17:29:21 -07:00
Violet
4da3978f94
Implement cuLibraryLoadData
( #388 )
2025-06-18 16:05:53 -07:00
Violet
8ce70c5095
Add integrity_check
implementation to ZLUDA ( #387 )
2025-06-17 15:00:10 -07:00
Andrzej Janik
2a374ad880
Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests ( #379 )
2025-06-16 19:14:16 -07:00
Violet
4d4053194a
Implement runtime_callback_hooks_fn6
( #386 )
2025-06-16 17:00:47 -07:00
Violet
9c5f1ed9fb
Handle new attributes in cuDeviceGetAttribute
( #383 )
2025-06-16 13:20:04 -07:00
Andrzej Janik
f179868b8e
Add automated builds ( #358 )
2025-06-16 09:53:18 -07:00
Violet
9773d20945
Implement cudart_interface_fn2 ( #382 )
2025-06-13 14:01:14 -07:00
Violet
1715830d82
Implement cuModuleGetLoadingMode ( #381 )
2025-06-11 15:54:48 -07:00
Violet
25a9d1c40e
Implement runtime_callback_hooks_fn2 ( #380 )
2025-06-11 15:15:43 -07:00
Violet
62f3e63355
Implement cuGetProcAddress and cuGetProcAddress_v2 ( #377 )
2025-06-10 16:07:35 -07:00
Andrzej Janik
3361046760
Fix mad.wide, replace external CUDA library in test with our own ( #376 )
2025-06-09 21:33:18 -07:00
Andrzej Janik
c790ab45ec
Redo logging to better log dark API and performance libraries ( #372 )
2025-06-09 15:29:14 -07:00
Andrzej Janik
5935cfec78
Work around broken AMD Adrenalin 25.5.1 driver ( #366 )
...
For reasons unknown AMD Adrenalin 25.5.1 ships with comgr that presents itself as version 2, but expects ABI for veersion 3. Add a workaround
2025-05-13 02:20:23 +02:00
Andrzej Janik
3d3e38aadc
Fix ROCm 6.4 failures ( #364 )
...
Lazy load comgr and dispatch to different code paths based on the name of the comgr .dll/.so
2025-05-02 00:38:22 +02:00
Andrzej Janik
cc83b9f1f6
Create infrastructure for performance libraries ( #363 )
2025-05-01 22:37:18 +02:00
Andrzej Janik
adc4673a20
Explicitly fail compilation on ROCm 6.4 ( #361 )
...
AMD broke comgr ABI in 6.4. This is a temporary solution.
2025-04-20 17:02:05 +02:00
Joëlle van Essen
7cdab7abc2
Implement mul24 ( #351 )
2025-04-08 12:27:19 +02:00
Andrzej Janik
d704e92c97
Support instruction modes (denormal and rounding) on AMD GPUs ( #342 )
2025-03-17 21:37:26 +01:00
Joëlle van Essen
867e4728d5
LLVM unit tests ( #324 )
...
* LLVM unit tests: add assembly files
* LLVM unit tests: first attempt
* LLVM unit tests: fix - parse bitcode in context
* LLVM unit tests: use pretty_assertions for line-by-line diff
* LLVM unit tests: Write IR to file for failed test
* LLVM unit tests: just use the stack
* LLVM unit tests: use MaybeUninit
* LLVM unit tests: add mul24.ll
* LLVM unit tests: Adjustments after review
* LLVM unit tests: Include emit_llvm::Context in emit_llvm::Module
* LLVM unit tests: Fix typo
* LLVM unit tests: Context need not be pub
2025-02-19 21:21:20 +01:00
Andrzej Janik
646d746e02
Start working on mul24
2025-02-07 19:37:11 +00:00
Andrzej Janik
df5a96d935
Improve build system ( #329 )
...
Also fix Dockerfile and Windows build
2025-01-28 01:55:36 +01:00
Alexander Zaitsev
9c0747a5f7
fix: missing inherits in a release-lto profile ( #319 )
2025-01-03 16:58:19 +01:00
Alexander Zaitsev
fee20e54d9
feat: enable LTO and codegen-units = 1 optimization ( #318 )
2025-01-02 19:07:39 +01:00
Joëlle van Essen
7399132d5d
Fix test in zluda_dump ( #316 )
2025-01-01 23:02:59 +01:00
Andrzej Janik
ecd61a8e2a
Update README for version 4 ( #315 )
2024-12-31 17:33:59 +01:00
Joëlle van Essen
de870db1f1
Fix build error ( #314 )
v4
2024-12-20 18:33:05 +01:00
Andrzej Janik
7ac67a89e9
Enable Geekbench 5 ( #304 )
2024-12-10 21:48:10 +01:00
Andrzej Janik
7a6df9dcbf
Fix host code and update to CUDA 12.4 ( #299 )
2024-12-02 00:29:57 +01:00
Rayyan Ul Haq
870fed4bb6
Update README.md ( #300 )
2024-11-25 00:45:09 +01:00
Andrzej Janik
970ba5aa25
Fix linking of AMD device libraries ( #296 )
...
It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm
2024-11-02 16:07:44 +01:00