Files
tucker-luminal 4cd47ffa45 luminal_python: dynamic-shape gather/scatter in the PT2 translator (#334)
`gather_elements` / `scatter_elements` / `scatter_nd` in luminal-core
require concrete shape dims, so `torch.compile(model, backend=luminal_backend)`
crashed the moment Dynamo handed us a SymInt for batch or seq_len.

The translator now lowers all three through Expression-typed shape
arithmetic and only calls luminal-core primitives that already accept
Expressions, with a small `dim_arith` helper that keeps every shape
product in canonical commutative order so different code paths don't
build syntactically-different versions of the same logical dim.

Verified end-to-end on Qwen3-30B-A3B across varying prompt lengths.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 16:43:38 -05:00
..
2026-05-16 14:52:35 -04:00
2026-03-25 03:37:37 +00:00
2026-05-23 00:35:46 -04:00

luminal_python

PyTorch torch.compile integration for Luminal.

CUDA Tests

The Python CUDA CI job builds the Rust extension with the CUDA feature and runs the non-slow pytest suite:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m "not slow"

The slow tests are explicit opt-in. They include large/pretrained model tests, full-width architecture compiles, Whisper end-to-end cases, and other cases that can take a long time or need a large GPU / Hugging Face cache.

Run the full Python CUDA suite, including slow tests:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s

Run only the slow Python CUDA tests:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m slow

The helper script follows the same convention:

cd crates/luminal_python
./run_tests_cuda.sh              # non-slow CUDA suite
./run_tests_cuda.sh --slow-only  # only slow CUDA tests
./run_tests_cuda.sh --include-slow

The GitHub/Modal entrypoint uses the same marker split:

cd crates/luminal_python
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s -m "not slow"
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s