423 Commits

Author SHA1 Message Date
Andrzej Janik
c00496b92a Install curl correctly (#461) v5-preview.86 v5-preview.85 2025-07-31 19:56:37 -07:00
Andrzej Janik
7fc6caffb6 Update nightly runner's ROCm version to 6.3.4 (#460) 2025-07-31 19:46:49 -07:00
Andrzej Janik
52d3ea624c Fix nightly run (#459) v5-preview.84 2025-07-31 19:20:39 -07:00
Andrzej Janik
704a94e6f4 Fix nightly tests trigger (#458) v5-preview.83 2025-07-31 18:31:54 -07:00
Andrzej Janik
cd7e2f8e36 Force loading ZLUDA through LD_PRELOAD (#447)
Certain applications (pytorch) decide that it's a great idea to distribute whole CUDA driver and link to it with DT_RPATH. This igores LD_LIBRARY_PATH.
This code defeats that evil mechanism through any means necessary
v5-preview.82
2025-07-31 18:00:13 -07:00
Violet
96ae27e9e1 Implement cublas functions needed for llm.c (#457) v5-preview.81 2025-07-31 11:08:53 -07:00
Violet
99c36092be Use FromCuda in zluda_blas (#455) v5-preview.80 2025-07-31 09:52:10 -07:00
Andrzej Janik
49aabffdcc Rename zluda_dump to zluda_trace (#456)
* Rename zluda_dump to zluda_trace

* Minor naming fixes
v5-preview.79
2025-07-31 08:07:03 -07:00
Violet
4d163a4d9b Implement cuModuleGetGlobal_v2 (#454) v5-preview.78 2025-07-30 16:34:21 -07:00
Violet
66db19a061 Move FromCuda and ZludaObject into a common crate (#452)
* Refactor FromCuda error type to be generic

* Create zluda_common crate

* Move FromCuda trait into zluda_common

* Write some doc comments

* Fix typo

* Edit comment

* Fix formatting
v5-preview.77
2025-07-30 15:53:22 -07:00
Violet
b8bcbec295 Always use Unix line endings (#453) v5-preview.76 v5-preview.75 2025-07-30 15:09:47 -07:00
Violet
21ef5f60a3 Check Rust formatting on pull requests (#451)
* Check Rust formatting on pull requests

This should help us maintain consistent style, without having unrelated style changes in pull requests from running `rustfmt`.

* cargo fmt non-generated files

* Ignore generated files
2025-07-30 14:55:09 -07:00
Violet
98b601d15a Use normalize_fn for performance libraries (#449)
The goal here is to make the performance library implementations work more like zluda.
v5-preview.74
2025-07-30 14:02:01 -07:00
Violet
c07d7678cd Format files (#450) v5-preview.73 2025-07-30 10:30:49 -07:00
Violet
481c3550fa Convert CUDA performance lib statuses to Rust result types (#444)
These changes replicate how the main library is handled. cuDNN still needs to have zluda_bindgen run and zluda_dump_common updated
v5-preview.72
2025-07-29 14:28:14 -07:00
Violet
303e4c2fb2 Update rocm_setup_build.sh (#446) v5-preview.71 2025-07-29 11:05:16 -07:00
Violet
4ffa669cce Fix Windows linkage (#445) v5-preview.70 2025-07-29 10:15:32 -07:00
Violet
d81404eb70 Add support for rocblas to zluda_bindgen (#440)
One step of several for adding cublas support
v5-preview.69
2025-07-28 15:07:22 -07:00
Andrzej Janik
8dbc7208de Try to make ZLUDA more robust on Windows (#442)
On my machine ZLUDA seems to segfault when initializing LLVM's C++ statics in Blender. Blender ships with C++ runtime. It seems that compiling C++ runtime statically fixes the issue. Might be actually unrelated.
Additionally, dtor crate on Windows seem to use a slightly dodgy method, so replace it with something more straightforward
v5-preview.68
2025-07-28 13:20:04 -07:00
Violet
f192dd317a Use implicit FromCuda for library::get_module (#439) v5-preview.67 2025-07-28 06:42:14 -07:00
Violet
8c23ef1ded Rename cuda_base cuda_macros (#435) v5-preview.66 2025-07-25 11:09:50 -07:00
Violet
ba38da0bbc Silence unused variable warnings (#434)
I'd left these in originally because I'd assumed that these functions would need full implementations soon, but they're really annoying. I've fixed all the other compiler warnings as well.
v5-preview.65
2025-07-25 10:28:33 -07:00
Andrzej Janik
c1dda55235 Add nightly tests (#433) v5-preview.64 2025-07-24 16:14:06 -07:00
Andrzej Janik
e8e20294a6 Set newly created context as current (#431) v5-preview.63 2025-07-24 11:05:04 -07:00
Andrzej Janik
5deada8426 Add cuCtxCreate_v2 and cuCtxDestroy_v2 (#430) v5-preview.62 2025-07-23 17:33:59 -07:00
Violet
2b90fdb56c Add support for cp.async (#427)
Adds support for

* `cp.async`
* `cp.async.commit_group`
* `cp.async.wait_group`
* `cp.async.wait_all`

Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op.
v5-preview.61
2025-07-23 16:25:49 -07:00
Andrzej Janik
3746079b1a Assorted instruction fixes (#423)
This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh.
Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough
v5-preview.60
2025-07-23 15:50:35 -07:00
Violet
119b635b9d Emit correct alignment for loads and stores (#429) v5-preview.59 2025-07-23 14:55:52 -07:00
Violet
a86ba3d642 Remove Type::Pointer (#428) v5-preview.58 2025-07-23 11:22:17 -07:00
Violet
27cfd50ddd Implement nanosleep.u32 (#421) v5-preview.57 2025-07-21 17:42:04 -07:00
Violet
72e2fe5b9a Remove unnecessary unsafe block (#426) v5-preview.56 2025-07-21 13:20:12 -07:00
Violet
f5712d9d5a Add parser support for hyphenated IDs in arguments (#425)
The syntax description for [`cp.async`](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async) has several elements not supported by the current parser. One such element is that the `cp-size` and `src-size` operands have hyphens in their IDs. This PR adds support for these IDs, and translates them as `cp_size` and `src_size`
v5-preview.55
2025-07-18 13:45:09 -07:00
Andrzej Janik
2f27c47acc Improve error recovery (#418) v5-preview.54 2025-07-17 10:02:03 -07:00
林博仁 Buo-ren Lin
0f8d4bb834 Fix typo in README.md (either) (#419) v5-preview.53 2025-07-17 09:32:41 -07:00
Violet
dc69808e54 Add support for shfl.sync.MODE.b32 (#409) v5-preview.52 v5-preview.51 2025-07-16 17:23:11 -07:00
Andrzej Janik
36f0ba9cbb Apply rounding mode in fp div (#416) 2025-07-16 17:22:59 -07:00
Violet
95d66df18e Only allow (.u32, .pred) for multiple return (#417) v5-preview.50 2025-07-16 17:03:28 -07:00
Violet
7c6b95a8e3 Allow messages for error_todo (#415) v5-preview.49 2025-07-16 15:54:40 -07:00
林博仁 Buo-ren Lin
039689253d Fix grammar errors in README.md (#414) v5-preview.48 2025-07-16 12:19:00 -07:00
林博仁 Buo-ren Lin
777392f69f Fix typo in README.md(self-contained) (#413) v5-preview.47 2025-07-16 11:41:07 -07:00
Violet
6fb09f393a Handle WARP_SZ (#412)
* Add tests for `WARP_SZ`

* Handle WARP_SZ in parser
v5-preview.46
2025-07-16 11:02:17 -07:00
Violet
06b28cfec7 More descriptive message for unknown symbol (#411) v5-preview.45 2025-07-14 15:01:38 -07:00
Violet
373d6d9e6e Remove duplicate call to linker (#410) v5-preview.44 2025-07-10 12:44:22 -07:00
Andrzej Janik
081f7d0976 Enable sccache in Rust builds, publish prerelease builds (#408) v5-preview.43 2025-07-09 09:20:03 -07:00
Violet
6e27f78ae7 Add support for multiple return arguments (#406) 2025-07-09 08:17:15 -07:00
Violet
fa7ecb2e02 Update README.md (#407) 2025-07-08 17:13:53 -07:00
Andrzej Janik
059b8ca0f6 Make sure it is possible to log 32bit PhysX (#374) 2025-07-08 10:19:49 -07:00
aiwhskruht
9bd8125c53 Implement more CUDA driver API to enable simple cuda-samples (#405) 2025-07-08 10:18:30 -07:00
aiwhskruht
8d5b734c30 Add initialized check to protect zluda from cuda driver calls during shutdown (#404) 2025-07-07 11:08:09 -07:00
Andrzej Janik
ef0c4afcf9 Run unit tests on every commit (#401) 2025-07-03 16:07:00 -07:00