ZLUDA

mirror of https://github.com/vosen/ZLUDA.git synced 2025-08-01 14:27:44 +03:00

Author	SHA1	Message	Date
Andrzej Janik	c00496b92a	Install curl correctly (#461 ) v5-preview.86 v5-preview.85	2025-07-31 19:56:37 -07:00
Andrzej Janik	7fc6caffb6	Update nightly runner's ROCm version to 6.3.4 (#460 )	2025-07-31 19:46:49 -07:00
Andrzej Janik	52d3ea624c	Fix nightly run (#459 ) v5-preview.84	2025-07-31 19:20:39 -07:00
Andrzej Janik	704a94e6f4	Fix nightly tests trigger (#458 ) v5-preview.83	2025-07-31 18:31:54 -07:00
Andrzej Janik	cd7e2f8e36	Force loading ZLUDA through LD_PRELOAD (#447 ) Certain applications (pytorch) decide that it's a great idea to distribute whole CUDA driver and link to it with DT_RPATH. This igores LD_LIBRARY_PATH. This code defeats that evil mechanism through any means necessary v5-preview.82	2025-07-31 18:00:13 -07:00
Violet	96ae27e9e1	Implement cublas functions needed for llm.c (#457 ) v5-preview.81	2025-07-31 11:08:53 -07:00
Violet	99c36092be	Use `FromCuda` in `zluda_blas` (#455 ) v5-preview.80	2025-07-31 09:52:10 -07:00
Andrzej Janik	49aabffdcc	Rename zluda_dump to zluda_trace (#456 ) * Rename zluda_dump to zluda_trace * Minor naming fixes v5-preview.79	2025-07-31 08:07:03 -07:00
Violet	4d163a4d9b	Implement `cuModuleGetGlobal_v2` (#454 ) v5-preview.78	2025-07-30 16:34:21 -07:00
Violet	66db19a061	Move `FromCuda` and `ZludaObject` into a common crate (#452 ) * Refactor FromCuda error type to be generic * Create zluda_common crate * Move FromCuda trait into zluda_common * Write some doc comments * Fix typo * Edit comment * Fix formatting v5-preview.77	2025-07-30 15:53:22 -07:00
Violet	b8bcbec295	Always use Unix line endings (#453 ) v5-preview.76 v5-preview.75	2025-07-30 15:09:47 -07:00
Violet	21ef5f60a3	Check Rust formatting on pull requests (#451 ) * Check Rust formatting on pull requests This should help us maintain consistent style, without having unrelated style changes in pull requests from running `rustfmt`. * cargo fmt non-generated files * Ignore generated files	2025-07-30 14:55:09 -07:00
Violet	98b601d15a	Use `normalize_fn` for performance libraries (#449 ) The goal here is to make the performance library implementations work more like zluda. v5-preview.74	2025-07-30 14:02:01 -07:00
Violet	c07d7678cd	Format files (#450 ) v5-preview.73	2025-07-30 10:30:49 -07:00
Violet	481c3550fa	Convert CUDA performance lib statuses to Rust result types (#444 ) These changes replicate how the main library is handled. cuDNN still needs to have zluda_bindgen run and zluda_dump_common updated v5-preview.72	2025-07-29 14:28:14 -07:00
Violet	303e4c2fb2	Update rocm_setup_build.sh (#446 ) v5-preview.71	2025-07-29 11:05:16 -07:00
Violet	4ffa669cce	Fix Windows linkage (#445 ) v5-preview.70	2025-07-29 10:15:32 -07:00
Violet	d81404eb70	Add support for `rocblas` to `zluda_bindgen` (#440 ) One step of several for adding cublas support v5-preview.69	2025-07-28 15:07:22 -07:00
Andrzej Janik	8dbc7208de	Try to make ZLUDA more robust on Windows (#442 ) On my machine ZLUDA seems to segfault when initializing LLVM's C++ statics in Blender. Blender ships with C++ runtime. It seems that compiling C++ runtime statically fixes the issue. Might be actually unrelated. Additionally, dtor crate on Windows seem to use a slightly dodgy method, so replace it with something more straightforward v5-preview.68	2025-07-28 13:20:04 -07:00
Violet	f192dd317a	Use implicit FromCuda for library::get_module (#439 ) v5-preview.67	2025-07-28 06:42:14 -07:00
Violet	8c23ef1ded	Rename cuda_base cuda_macros (#435 ) v5-preview.66	2025-07-25 11:09:50 -07:00
Violet	ba38da0bbc	Silence unused variable warnings (#434 ) I'd left these in originally because I'd assumed that these functions would need full implementations soon, but they're really annoying. I've fixed all the other compiler warnings as well. v5-preview.65	2025-07-25 10:28:33 -07:00
Andrzej Janik	c1dda55235	Add nightly tests (#433 ) v5-preview.64	2025-07-24 16:14:06 -07:00
Andrzej Janik	e8e20294a6	Set newly created context as current (#431 ) v5-preview.63	2025-07-24 11:05:04 -07:00
Andrzej Janik	5deada8426	Add cuCtxCreate_v2 and cuCtxDestroy_v2 (#430 ) v5-preview.62	2025-07-23 17:33:59 -07:00
Violet	2b90fdb56c	Add support for `cp.async` (#427 ) Adds support for * `cp.async` * `cp.async.commit_group` * `cp.async.wait_group` * `cp.async.wait_all` Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op. v5-preview.61	2025-07-23 16:25:49 -07:00
Andrzej Janik	3746079b1a	Assorted instruction fixes (#423 ) This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh. Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough v5-preview.60	2025-07-23 15:50:35 -07:00
Violet	119b635b9d	Emit correct alignment for loads and stores (#429 ) v5-preview.59	2025-07-23 14:55:52 -07:00
Violet	a86ba3d642	Remove Type::Pointer (#428 ) v5-preview.58	2025-07-23 11:22:17 -07:00
Violet	27cfd50ddd	Implement `nanosleep.u32` (#421 ) v5-preview.57	2025-07-21 17:42:04 -07:00
Violet	72e2fe5b9a	Remove unnecessary unsafe block (#426 ) v5-preview.56	2025-07-21 13:20:12 -07:00
Violet	f5712d9d5a	Add parser support for hyphenated IDs in arguments (#425 ) The syntax description for [`cp.async`](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async) has several elements not supported by the current parser. One such element is that the `cp-size` and `src-size` operands have hyphens in their IDs. This PR adds support for these IDs, and translates them as `cp_size` and `src_size` v5-preview.55	2025-07-18 13:45:09 -07:00
Andrzej Janik	2f27c47acc	Improve error recovery (#418 ) v5-preview.54	2025-07-17 10:02:03 -07:00
林博仁 Buo-ren Lin	0f8d4bb834	Fix typo in README.md (either) (#419 ) v5-preview.53	2025-07-17 09:32:41 -07:00
Violet	dc69808e54	Add support for `shfl.sync.MODE.b32` (#409 ) v5-preview.52 v5-preview.51	2025-07-16 17:23:11 -07:00
Andrzej Janik	36f0ba9cbb	Apply rounding mode in fp div (#416 )	2025-07-16 17:22:59 -07:00
Violet	95d66df18e	Only allow (.u32, .pred) for multiple return (#417 ) v5-preview.50	2025-07-16 17:03:28 -07:00
Violet	7c6b95a8e3	Allow messages for error_todo (#415 ) v5-preview.49	2025-07-16 15:54:40 -07:00
林博仁 Buo-ren Lin	039689253d	Fix grammar errors in README.md (#414 ) v5-preview.48	2025-07-16 12:19:00 -07:00
林博仁 Buo-ren Lin	777392f69f	Fix typo in README.md(self-contained) (#413 ) v5-preview.47	2025-07-16 11:41:07 -07:00
Violet	6fb09f393a	Handle `WARP_SZ` (#412 ) * Add tests for `WARP_SZ` * Handle WARP_SZ in parser v5-preview.46	2025-07-16 11:02:17 -07:00
Violet	06b28cfec7	More descriptive message for unknown symbol (#411 ) v5-preview.45	2025-07-14 15:01:38 -07:00
Violet	373d6d9e6e	Remove duplicate call to linker (#410 ) v5-preview.44	2025-07-10 12:44:22 -07:00
Andrzej Janik	081f7d0976	Enable sccache in Rust builds, publish prerelease builds (#408 ) v5-preview.43	2025-07-09 09:20:03 -07:00
Violet	6e27f78ae7	Add support for multiple return arguments (#406 )	2025-07-09 08:17:15 -07:00
Violet	fa7ecb2e02	Update README.md (#407 )	2025-07-08 17:13:53 -07:00
Andrzej Janik	059b8ca0f6	Make sure it is possible to log 32bit PhysX (#374 )	2025-07-08 10:19:49 -07:00
aiwhskruht	9bd8125c53	Implement more CUDA driver API to enable simple cuda-samples (#405 )	2025-07-08 10:18:30 -07:00
aiwhskruht	8d5b734c30	Add initialized check to protect zluda from cuda driver calls during shutdown (#404 )	2025-07-07 11:08:09 -07:00
Andrzej Janik	ef0c4afcf9	Run unit tests on every commit (#401 )	2025-07-03 16:07:00 -07:00

1 2 3 4 5 ...

423 Commits