mirror of
https://github.com/vosen/ZLUDA.git
synced 2025-07-18 17:56:22 +03:00
Throw away useless stuff
This commit is contained in:
83
doc/NOTES.md
83
doc/NOTES.md
@ -1,83 +0,0 @@
|
|||||||
Parser generators in Rust:
|
|
||||||
--------------------------
|
|
||||||
I'm convinced nobody actually uses parser generators in Rust:
|
|
||||||
* pomelo can't generate lexer (understandable, as it is a port of lemon and lemon can't do this either)
|
|
||||||
* pest can't do parse actions, you have to convert your parse tree to ast manually
|
|
||||||
* lalrpop can't do comments
|
|
||||||
* and the day I wrote the line above it can
|
|
||||||
* reports parsing errors as byte offsets
|
|
||||||
* if you want to skip parsing one of the alternatives, functional design gets quite awkward
|
|
||||||
* antlr4rust is untried and requires java to build
|
|
||||||
* no library supports island grammars
|
|
||||||
|
|
||||||
What to emit?
|
|
||||||
-------------
|
|
||||||
* SPIR-V
|
|
||||||
* Better library support, easier to emit
|
|
||||||
* Can by optimized by IGC
|
|
||||||
* Can't do some things (not sure what exactly yet)
|
|
||||||
* But we can work around with inline VISA
|
|
||||||
* VISA
|
|
||||||
* Quicker compilation
|
|
||||||
|
|
||||||
A64 vs BTS
|
|
||||||
----------
|
|
||||||
* How to force A64: -cl-intel-greater-than-4GB-buffer-required
|
|
||||||
* PTX made a baffling desing choice: global pointers are represented as untyped 64bit integers
|
|
||||||
* Consequently, there's no 100% certain way to know which argument is a surface and which is a scalar
|
|
||||||
* It seems that NVidia guys realized what a horrible idea that was and emit `cvta.to.global` as a marker for global pointers?
|
|
||||||
* But it's only emitted in a recent release build, can't rely on it
|
|
||||||
* Maybe debug builds emit debug metadata to detect surfaces?
|
|
||||||
* Might add this as an optimization later
|
|
||||||
* `cuLaunchKernel` docs say this: "The number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel's image", note the wording: _offsets_ and _sizes_ and not _types_
|
|
||||||
* Wait, you can mark an argument as a pointer with `.ptr`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr, but it's useless with NV compiler not emitting it
|
|
||||||
* Potential solution: compile only during the dispatch, when type of arguments is known?
|
|
||||||
* Can't do, the set of arguments passed to cuLaunchKernel is untyped
|
|
||||||
* Solution: treat all arguments as untyped integers and say goodbye to BTS access
|
|
||||||
|
|
||||||
Implicit conversions
|
|
||||||
--------------------
|
|
||||||
* PTX support for implicit conversions is completely degenerate, docs say:
|
|
||||||
_For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes_
|
|
||||||
Which is sensible, but completely untrue. In reality ptxas compiles silly code like this:
|
|
||||||
```
|
|
||||||
param.f32 param_1
|
|
||||||
...
|
|
||||||
.reg.s32 %r1
|
|
||||||
ld.param.b16 %r1, [param_1];
|
|
||||||
```
|
|
||||||
* Surprise, surprise, there's two kind of implicit conversions at play in the example above:
|
|
||||||
* "Relaxed type-checking rules": this is the conversion of b16 operation type to s32 dst register
|
|
||||||
* Undocumented type coercion when dereferencing param_1. The PTX behaviour is to coerce **every** type. It's something to the effect of `[param_1] = *(b16*)param_1`
|
|
||||||
|
|
||||||
PTX grammar
|
|
||||||
-----------
|
|
||||||
* PTX grammar rules are atrocious, keywords can be freely reused as ids without escaping
|
|
||||||
* Modifiers can be applied to instructions in any arbitrary order. We don't support it and hope we will never have to
|
|
||||||
|
|
||||||
|
|
||||||
Rust debugging
|
|
||||||
--------------
|
|
||||||
* Nothing works 100% well on vscode/Windows:
|
|
||||||
* MSVC/lldb - always garbage (simple enums are fubar)
|
|
||||||
* MSVC/cppvsdbg - sometimes garbage (nested enums are fubar)
|
|
||||||
* GNU/lldb - mostly fine, but can't follow child processes
|
|
||||||
* GNU/gdb - always garbage (I don't have the patience to manually QA rust-gdb on Windows) and doesn't quite understand file paths for break points
|
|
||||||
* Neither on vscode/Linux:
|
|
||||||
* lldb - mostly fine, but can't follow child processes
|
|
||||||
* gdb - visualizes variables somewhat awkardly (shows all possible variants of an enum)
|
|
||||||
* CLion could be the solution, but intellij-rust can't load this project
|
|
||||||
|
|
||||||
CUDA <-> L0
|
|
||||||
-----------
|
|
||||||
* device ~= device
|
|
||||||
* stream ~= command queue
|
|
||||||
* context ~= context (1.0+)
|
|
||||||
* graph ~= command list
|
|
||||||
* module ~= module
|
|
||||||
|
|
||||||
IGC
|
|
||||||
---
|
|
||||||
* IGC is extremely brittle and segfaults on fairly innocent code:
|
|
||||||
* OpBitcast of pointer to uint
|
|
||||||
* OpCopyMemory of alloca'd variable
|
|
@ -1,36 +0,0 @@
|
|||||||
import os
|
|
||||||
import subprocess
|
|
||||||
import tempfile
|
|
||||||
|
|
||||||
types = ["u8", "u16", "u32", "u64", "s8", "s16", "s32", "s64", "f16", "f32", "f64"]
|
|
||||||
rnd = ["", ".rn", ".rni"]
|
|
||||||
ftz_all = ["", ".ftz"]
|
|
||||||
sat = ["", ".sat"]
|
|
||||||
|
|
||||||
for in_type in types:
|
|
||||||
for out_type in types:
|
|
||||||
for r in rnd:
|
|
||||||
for ftz in ftz_all:
|
|
||||||
for s in sat:
|
|
||||||
with tempfile.TemporaryDirectory() as dir:
|
|
||||||
f_name = os.path.join(dir, 'ptx')
|
|
||||||
out_name = os.path.join(dir, 'out')
|
|
||||||
with open(f_name, 'w') as f:
|
|
||||||
f.write(
|
|
||||||
f"""
|
|
||||||
.version 6.5
|
|
||||||
.target sm_30
|
|
||||||
.address_size 64
|
|
||||||
.visible .entry VecAdd_kernel()
|
|
||||||
{{
|
|
||||||
.reg.{in_type} r1;
|
|
||||||
.reg.{out_type} r2;
|
|
||||||
cvt{r}{ftz}{s}.{out_type}.{in_type} r2, r1;
|
|
||||||
ret;
|
|
||||||
}}
|
|
||||||
""")
|
|
||||||
err = subprocess.run(f"ptxas {f_name} -o {out_name}", capture_output = True)
|
|
||||||
if err.returncode == 0:
|
|
||||||
print(f"cvt{r}{ftz}{s}.{out_type}.{in_type}")
|
|
||||||
#else:
|
|
||||||
# print(f"[INVALID] cvt{r}{ftz}{s}.{out_type}.{in_type}")
|
|
@ -1,31 +0,0 @@
|
|||||||
import os
|
|
||||||
import subprocess
|
|
||||||
import tempfile
|
|
||||||
|
|
||||||
types = ["b8", "b16", "b32", "b64", "u8", "u16", "u32", "u64", "s8", "s16", "s32", "s64", "f32", "f64"]
|
|
||||||
|
|
||||||
for op_type in types:
|
|
||||||
for output_type in types:
|
|
||||||
with tempfile.TemporaryDirectory() as dir:
|
|
||||||
f_name = os.path.join(dir, 'ptx')
|
|
||||||
out_name = os.path.join(dir, 'out')
|
|
||||||
with open(f_name, 'w') as f:
|
|
||||||
f.write(
|
|
||||||
f"""
|
|
||||||
.version 6.5
|
|
||||||
.target sm_30
|
|
||||||
.address_size 64
|
|
||||||
.visible .entry VecAdd_kernel(
|
|
||||||
.param .{op_type} input
|
|
||||||
)
|
|
||||||
{{
|
|
||||||
.reg.{output_type} r1;
|
|
||||||
ld.param.{op_type} r1, [input];
|
|
||||||
ret;
|
|
||||||
}}
|
|
||||||
""")
|
|
||||||
err = subprocess.run(f"ptxas {f_name} -o {out_name}", capture_output = True)
|
|
||||||
if err.returncode == 0:
|
|
||||||
print(f"{op_type} {output_type}")
|
|
||||||
else:
|
|
||||||
print(f"[INVALID] {op_type} {output_type}")
|
|
@ -1,31 +0,0 @@
|
|||||||
import os
|
|
||||||
import subprocess
|
|
||||||
import tempfile
|
|
||||||
|
|
||||||
types = ["b8", "b16", "b32", "b64", "u8", "u16", "u32", "u64", "s8", "s16", "s32", "s64", "f32", "f64"]
|
|
||||||
|
|
||||||
for input_type in types:
|
|
||||||
for op_type in types:
|
|
||||||
with tempfile.TemporaryDirectory() as dir:
|
|
||||||
f_name = os.path.join(dir, 'ptx')
|
|
||||||
out_name = os.path.join(dir, 'out')
|
|
||||||
with open(f_name, 'w') as f:
|
|
||||||
f.write(
|
|
||||||
f"""
|
|
||||||
.version 6.5
|
|
||||||
.target sm_30
|
|
||||||
.address_size 64
|
|
||||||
.visible .entry VecAdd_kernel(
|
|
||||||
.param .{input_type} input
|
|
||||||
)
|
|
||||||
{{
|
|
||||||
.reg.{op_type} r1;
|
|
||||||
ld.param.{op_type} r1, [input];
|
|
||||||
ret;
|
|
||||||
}}
|
|
||||||
""")
|
|
||||||
err = subprocess.run(f"ptxas {f_name} -o {out_name}")
|
|
||||||
if err.returncode == 0:
|
|
||||||
print(f"{op_type} {input_type}")
|
|
||||||
else:
|
|
||||||
print(f"[INVALID] {op_type} {input_type}")
|
|
Reference in New Issue
Block a user