mirror of
https://git.teahaven.kr/Rust-related/luminal.git
synced 2026-06-05 09:09:47 +09:00
`gather_elements` / `scatter_elements` / `scatter_nd` in luminal-core require concrete shape dims, so `torch.compile(model, backend=luminal_backend)` crashed the moment Dynamo handed us a SymInt for batch or seq_len. The translator now lowers all three through Expression-typed shape arithmetic and only calls luminal-core primitives that already accept Expressions, with a small `dim_arith` helper that keeps every shape product in canonical commutative order so different code paths don't build syntactically-different versions of the same logical dim. Verified end-to-end on Qwen3-30B-A3B across varying prompt lengths. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
luminal_python
PyTorch torch.compile integration for Luminal.
CUDA Tests
The Python CUDA CI job builds the Rust extension with the CUDA feature and runs the non-slow pytest suite:
cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m "not slow"
The slow tests are explicit opt-in. They include large/pretrained model tests, full-width architecture compiles, Whisper end-to-end cases, and other cases that can take a long time or need a large GPU / Hugging Face cache.
Run the full Python CUDA suite, including slow tests:
cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s
Run only the slow Python CUDA tests:
cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m slow
The helper script follows the same convention:
cd crates/luminal_python
./run_tests_cuda.sh # non-slow CUDA suite
./run_tests_cuda.sh --slow-only # only slow CUDA tests
./run_tests_cuda.sh --include-slow
The GitHub/Modal entrypoint uses the same marker split:
cd crates/luminal_python
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s -m "not slow"
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s