ZLUDA

mirror of https://github.com/vosen/ZLUDA.git synced 2025-07-30 13:28:10 +03:00

Author	SHA1	Message	Date
Violet	2b90fdb56c	Add support for `cp.async` (#427 ) Adds support for * `cp.async` * `cp.async.commit_group` * `cp.async.wait_group` * `cp.async.wait_all` Asynchronous copy operations are only supported by AMD Instinct GPUs, so for now we lower them as synchronous copy operations. Because of this, `cp.async.commit_group`, `cp.async.wait_group`, and `cp.async.wait_all` are no-op.	2025-07-23 16:25:49 -07:00
Andrzej Janik	3746079b1a	Assorted instruction fixes (#423 ) This fixes transcendentals and some other buggy instructions exposed by `ptx_tests` (abs, neg). Add (slow - hardware limitation) tanh. Only two remaining incorrect instructions are div and sqrt with non-default rounding, but this commit is already bloated enough	2025-07-23 15:50:35 -07:00
Violet	119b635b9d	Emit correct alignment for loads and stores (#429 )	2025-07-23 14:55:52 -07:00
Violet	a86ba3d642	Remove Type::Pointer (#428 )	2025-07-23 11:22:17 -07:00
Violet	27cfd50ddd	Implement `nanosleep.u32` (#421 )	2025-07-21 17:42:04 -07:00
Violet	dc69808e54	Add support for `shfl.sync.MODE.b32` (#409 )	2025-07-16 17:23:11 -07:00
Andrzej Janik	36f0ba9cbb	Apply rounding mode in fp div (#416 )	2025-07-16 17:22:59 -07:00
Violet	95d66df18e	Only allow (.u32, .pred) for multiple return (#417 )	2025-07-16 17:03:28 -07:00
Violet	7c6b95a8e3	Allow messages for error_todo (#415 )	2025-07-16 15:54:40 -07:00
Violet	6fb09f393a	Handle `WARP_SZ` (#412 ) * Add tests for `WARP_SZ` * Handle WARP_SZ in parser	2025-07-16 11:02:17 -07:00
Violet	06b28cfec7	More descriptive message for unknown symbol (#411 )	2025-07-14 15:01:38 -07:00
Violet	6e27f78ae7	Add support for multiple return arguments (#406 )	2025-07-09 08:17:15 -07:00
Andrzej Janik	059b8ca0f6	Make sure it is possible to log 32bit PhysX (#374 )	2025-07-08 10:19:49 -07:00
Andrzej Janik	ef0c4afcf9	Run unit tests on every commit (#401 )	2025-07-03 16:07:00 -07:00
Violet	5cb0a9b8e8	Add support for `bar.red.and.pred` (#402 ) Implements bar.red.and.pred and bar.red.or.pred, using the undocument __ockl_wgred functions. Doesn't yet add support for numbered barriers and threadcount, as these are not needed for llm.c.	2025-07-03 11:56:20 -07:00
Violet	7bdd20f0dd	Add warp-wide tests (#400 )	2025-07-02 18:11:36 -07:00
Andrzej Janik	6d56fa8c34	Fix floating point min/max (#399 )	2025-07-01 15:58:16 -07:00
Violet	b824424367	Read test files at runtime for development ergonomics (#395 )	2025-07-01 10:31:06 -07:00
Violet	80607c07db	Check LLVM IR for `test_ptx!` with no input/output (#394 )	2025-06-24 11:53:30 -07:00
Andrzej Janik	22608d7420	Bump dependencies (#392 ) zip 2.6.1 was yanked and microlp 2.10 has a major bug	2025-06-23 18:04:08 -07:00
Andrzej Janik	2a374ad880	Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests (#379 )	2025-06-16 19:14:16 -07:00
Andrzej Janik	3361046760	Fix mad.wide, replace external CUDA library in test with our own (#376 )	2025-06-09 21:33:18 -07:00
Andrzej Janik	3d3e38aadc	Fix ROCm 6.4 failures (#364 ) Lazy load comgr and dispatch to different code paths based on the name of the comgr .dll/.so	2025-05-02 00:38:22 +02:00
Joëlle van Essen	7cdab7abc2	Implement mul24 (#351 )	2025-04-08 12:27:19 +02:00
Andrzej Janik	d704e92c97	Support instruction modes (denormal and rounding) on AMD GPUs (#342 )	2025-03-17 21:37:26 +01:00
Joëlle van Essen	867e4728d5	LLVM unit tests (#324 ) * LLVM unit tests: add assembly files * LLVM unit tests: first attempt * LLVM unit tests: fix - parse bitcode in context * LLVM unit tests: use pretty_assertions for line-by-line diff * LLVM unit tests: Write IR to file for failed test * LLVM unit tests: just use the stack * LLVM unit tests: use MaybeUninit * LLVM unit tests: add mul24.ll * LLVM unit tests: Adjustments after review * LLVM unit tests: Include emit_llvm::Context in emit_llvm::Module * LLVM unit tests: Fix typo * LLVM unit tests: Context need not be pub	2025-02-19 21:21:20 +01:00
Andrzej Janik	646d746e02	Start working on mul24	2025-02-07 19:37:11 +00:00
Andrzej Janik	7ac67a89e9	Enable Geekbench 5 (#304 )	2024-12-10 21:48:10 +01:00
Andrzej Janik	7a6df9dcbf	Fix host code and update to CUDA 12.4 (#299 )	2024-12-02 00:29:57 +01:00
Andrzej Janik	970ba5aa25	Fix linking of AMD device libraries (#296 ) It's weird that it fails without `-mno-link-builtin-bitcode-postopt`. I've tested it only on ROCm 6.2, might be broken on older or newer ROCm	2024-11-02 16:07:44 +01:00
Andrzej Janik	3870a96592	Re-enable all failing PTX tests (#277 ) Additionally remove unused compilation paths	2024-10-16 03:15:48 +02:00
Andrzej Janik	c92abba2bb	Refactor compilation passes (#270 ) The overarching goal is to refactor all passes so they are module-scoped and not function-scoped. Additionally, make improvements to the most egregiously buggy/unfit passes (so the code is ready for the next major features: linking, ftz handling) and continue adding more code to the LLVM backend	2024-09-23 16:33:46 +02:00
Andrzej Janik	46def3e7e0	Connect new parser to LLVM bitcode backend (#269 ) This is very incomplete. Just enough code to emit LLVM bitcode and continue further development	2024-09-13 01:07:31 +02:00
Andrzej Janik	193eb29be8	PTX parser rewrite (#267 ) Replaces traditional LALRPOP-based parser with winnow-based parser to handle out-of-order instruction modifer. Generate instruction type and instruction visitor from a macro instead of writing by hand. Add separate compilation path using the new parser that only works in tests for now	2024-09-04 15:47:42 +02:00
Andrzej Janik	971951bc9e	Improve reporting of recovered unrecognized statement/directive	2021-12-14 00:02:23 +01:00
Andrzej Janik	0ca14d740f	Better reporting of unrecognized tokens	2021-12-13 22:25:26 +01:00
Andrzej Janik	7ba1586d6c	Make all user errors recoverable	2021-12-13 17:20:06 +01:00
Andrzej Janik	816365e7df	Fix shared munging pass and add fix cuModuleLoadData	2021-09-29 21:49:47 +00:00
Andrzej Janik	0172dc58e5	Redo shared memory transformation	2021-09-29 02:24:32 +02:00
Andrzej Janik	b763415006	Add CUDA tests showing problems with .shared unification	2021-09-27 00:42:10 +02:00
Andrzej Janik	c23be576e8	Finish fixing shared memory pass	2021-09-26 01:24:14 +02:00
Andrzej Janik	370c0bd09e	Start implementing .shared unification	2021-09-24 01:31:50 +02:00
Andrzej Janik	9609f86033	Fix minor bugs	2021-09-19 00:39:43 +00:00
Andrzej Janik	afe9120868	Fix linkage	2021-09-18 22:49:00 +00:00
Andrzej Janik	04a411fe22	Have an implementation for vprintf	2021-09-18 20:22:47 +00:00
Andrzej Janik	ccf3c02ac1	Minor fixes	2021-09-18 01:36:12 +00:00
Andrzej Janik	3de01b3f8b	Handle ld.volatile/st.volatile	2021-09-17 21:26:15 +00:00
Andrzej Janik	d5a4b068dd	Redo handling of sregs	2021-09-17 20:53:44 +00:00
Andrzej Janik	6ef19d6501	Add early support for more sregs	2021-09-17 18:31:12 +00:00
Andrzej Janik	5b2352723f	Implement function pointers and activemask	2021-09-17 16:24:25 +00:00

1 2 3 4 5

226 Commits