Andrzej Janik
e0882694c3
Linux fix
2025-08-01 00:28:05 +00:00
Andrzej Janik
0d7438ff9f
Re-fix
2025-07-31 17:27:18 -07:00
Andrzej Janik
824b9c1719
Fix Windows build
2025-08-01 00:18:29 +00:00
Andrzej Janik
b01cc29f7c
Address review comments
2025-08-01 00:01:46 +00:00
Andrzej Janik
117cb08341
Fix build
2025-07-31 17:00:48 +00:00
Andrzej Janik
4a94043e8c
Merge commit '99c36092bea198c90fd00f6eea83c728e497ff35' into unix_preload
2025-07-31 16:56:51 +00:00
Violet
99c36092be
Use FromCuda
in zluda_blas
( #455 )
v5-preview.80
2025-07-31 09:52:10 -07:00
Andrzej Janik
49aabffdcc
Rename zluda_dump to zluda_trace ( #456 )
...
* Rename zluda_dump to zluda_trace
* Minor naming fixes
v5-preview.79
2025-07-31 08:07:03 -07:00
Andrzej Janik
6cacea5599
Fix build
2025-07-31 00:18:10 +00:00
Violet
4d163a4d9b
Implement cuModuleGetGlobal_v2
( #454 )
v5-preview.78
2025-07-30 16:34:21 -07:00
Andrzej Janik
4197fa003c
Merge remote-tracking branch 'origin/master' into unix_preload
...
# Conflicts:
# xtask/src/main.rs
# zluda_dump/src/os_unix.rs
2025-07-30 16:25:54 -07:00
Andrzej Janik
dbc7271863
Add README, fix Windows builds
2025-07-30 16:21:55 -07:00
Violet
66db19a061
Move FromCuda
and ZludaObject
into a common crate ( #452 )
...
* Refactor FromCuda error type to be generic
* Create zluda_common crate
* Move FromCuda trait into zluda_common
* Write some doc comments
* Fix typo
* Edit comment
* Fix formatting
v5-preview.77
2025-07-30 15:53:22 -07:00
Andrzej Janik
b1db397979
Preload perf libraries
2025-07-30 22:46:10 +00:00
Violet
b8bcbec295
Always use Unix line endings ( #453 )
v5-preview.76
v5-preview.75
2025-07-30 15:09:47 -07:00
Violet
21ef5f60a3
Check Rust formatting on pull requests ( #451 )
...
* Check Rust formatting on pull requests
This should help us maintain consistent style, without having unrelated style changes in pull requests from running `rustfmt`.
* cargo fmt non-generated files
* Ignore generated files
2025-07-30 14:55:09 -07:00
Violet
98b601d15a
Use normalize_fn
for performance libraries ( #449 )
...
The goal here is to make the performance library implementations work more like zluda.
v5-preview.74
2025-07-30 14:02:01 -07:00
Violet
c07d7678cd
Format files ( #450 )
v5-preview.73
2025-07-30 10:30:49 -07:00
Andrzej Janik
87f74c8752
Use no-redirect dlopen in performance libraries
2025-07-29 23:46:53 +00:00
Andrzej Janik
c48fd9f1d8
Detect if we are being redirected in zluda_dump
2025-07-29 21:56:32 +00:00
Andrzej Janik
72879ef3c6
Add a shim which uses LD_PRELOAD to force loading of ZLUDA
2025-07-29 21:31:15 +00:00
Violet
481c3550fa
Convert CUDA performance lib statuses to Rust result types ( #444 )
...
These changes replicate how the main library is handled. cuDNN still needs to have zluda_bindgen run and zluda_dump_common updated
v5-preview.72
2025-07-29 14:28:14 -07:00
Violet
303e4c2fb2
Update rocm_setup_build.sh ( #446 )
v5-preview.71
2025-07-29 11:05:16 -07:00
Violet
4ffa669cce
Fix Windows linkage ( #445 )
v5-preview.70
2025-07-29 10:15:32 -07:00
Violet
d81404eb70
Add support for rocblas
to zluda_bindgen
( #440 )
...
One step of several for adding cublas support
v5-preview.69
2025-07-28 15:07:22 -07:00
Andrzej Janik
8dbc7208de
Try to make ZLUDA more robust on Windows ( #442 )
...
On my machine ZLUDA seems to segfault when initializing LLVM's C++ statics in Blender. Blender ships with C++ runtime. It seems that compiling C++ runtime statically fixes the issue. Might be actually unrelated.
Additionally, dtor crate on Windows seem to use a slightly dodgy method, so replace it with something more straightforward
v5-preview.68
2025-07-28 13:20:04 -07:00
Violet
f192dd317a
Use implicit FromCuda for library::get_module ( #439 )
v5-preview.67
2025-07-28 06:42:14 -07:00
Violet
8c23ef1ded
Rename cuda_base cuda_macros ( #435 )
v5-preview.66
2025-07-25 11:09:50 -07:00
Violet
ba38da0bbc
Silence unused variable warnings ( #434 )
...
I'd left these in originally because I'd assumed that these functions would need full implementations soon, but they're really annoying. I've fixed all the other compiler warnings as well.
v5-preview.65
2025-07-25 10:28:33 -07:00
Andrzej Janik
c1dda55235
Add nightly tests ( #433 )
v5-preview.64
2025-07-24 16:14:06 -07:00
Andrzej Janik
e8e20294a6
Set newly created context as current ( #431 )
v5-preview.63
2025-07-24 11:05:04 -07:00
Andrzej Janik
5deada8426
Add cuCtxCreate_v2 and cuCtxDestroy_v2 ( #430 )
v5-preview.62
2025-07-23 17:33:59 -07:00
Violet
2b90fdb56c
Add support for cp.async
( #427 )
...
Adds support for
* `cp.async`
* `cp.async.commit_group`
* `cp.async.wait_group`
* `cp.async.wait_all`
Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op.
v5-preview.61
2025-07-23 16:25:49 -07:00
Andrzej Janik
3746079b1a
Assorted instruction fixes ( #423 )
...
This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh.
Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough
v5-preview.60
2025-07-23 15:50:35 -07:00
Violet
119b635b9d
Emit correct alignment for loads and stores ( #429 )
v5-preview.59
2025-07-23 14:55:52 -07:00
Violet
a86ba3d642
Remove Type::Pointer ( #428 )
v5-preview.58
2025-07-23 11:22:17 -07:00
Violet
27cfd50ddd
Implement nanosleep.u32
( #421 )
v5-preview.57
2025-07-21 17:42:04 -07:00
Violet
72e2fe5b9a
Remove unnecessary unsafe block ( #426 )
v5-preview.56
2025-07-21 13:20:12 -07:00
Violet
f5712d9d5a
Add parser support for hyphenated IDs in arguments ( #425 )
...
The syntax description for [`cp.async`](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async ) has several elements not supported by the current parser. One such element is that the `cp-size` and `src-size` operands have hyphens in their IDs. This PR adds support for these IDs, and translates them as `cp_size` and `src_size`
v5-preview.55
2025-07-18 13:45:09 -07:00
Andrzej Janik
2f27c47acc
Improve error recovery ( #418 )
v5-preview.54
2025-07-17 10:02:03 -07:00
林博仁 Buo-ren Lin
0f8d4bb834
Fix typo in README.md (either) ( #419 )
v5-preview.53
2025-07-17 09:32:41 -07:00
Violet
dc69808e54
Add support for shfl.sync.MODE.b32
( #409 )
v5-preview.52
v5-preview.51
2025-07-16 17:23:11 -07:00
Andrzej Janik
36f0ba9cbb
Apply rounding mode in fp div ( #416 )
2025-07-16 17:22:59 -07:00
Violet
95d66df18e
Only allow (.u32, .pred) for multiple return ( #417 )
v5-preview.50
2025-07-16 17:03:28 -07:00
Violet
7c6b95a8e3
Allow messages for error_todo ( #415 )
v5-preview.49
2025-07-16 15:54:40 -07:00
林博仁 Buo-ren Lin
039689253d
Fix grammar errors in README.md ( #414 )
v5-preview.48
2025-07-16 12:19:00 -07:00
林博仁 Buo-ren Lin
777392f69f
Fix typo in README.md(self-contained) ( #413 )
v5-preview.47
2025-07-16 11:41:07 -07:00
Violet
6fb09f393a
Handle WARP_SZ
( #412 )
...
* Add tests for `WARP_SZ`
* Handle WARP_SZ in parser
v5-preview.46
2025-07-16 11:02:17 -07:00
Violet
06b28cfec7
More descriptive message for unknown symbol ( #411 )
v5-preview.45
2025-07-14 15:01:38 -07:00
Violet
373d6d9e6e
Remove duplicate call to linker ( #410 )
v5-preview.44
2025-07-10 12:44:22 -07:00