mirror of https://git.teahaven.kr/Rust-related/luminal.git synced 2026-06-05 09:09:47 +09:00

Files

tucker-luminal 4cd47ffa45 luminal_python: dynamic-shape gather/scatter in the PT2 translator (#334 )

`gather_elements` / `scatter_elements` / `scatter_nd` in luminal-core
require concrete shape dims, so `torch.compile(model, backend=luminal_backend)`
crashed the moment Dynamo handed us a SymInt for batch or seq_len.

The translator now lowers all three through Expression-typed shape
arithmetic and only calls luminal-core primitives that already accept
Expressions, with a small `dim_arith` helper that keeps every shape
product in canonical commutative order so different code paths don't
build syntactically-different versions of the same logical dim.

Verified end-to-end on Qwen3-30B-A3B across varying prompt lengths.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-26 16:43:38 -05:00

examples

Flux 2 Dev (#304 )

2026-05-16 14:52:35 -04:00

rust

luminal_python: dynamic-shape gather/scatter in the PT2 translator (#334 )

2026-05-26 16:43:38 -05:00

src/luminal

Dtype i64 f64 first class (#323 )

2026-05-23 00:35:46 -04:00

tests

luminal_python: dynamic-shape gather/scatter in the PT2 translator (#334 )

2026-05-26 16:43:38 -05:00

.gitignore

ignore venv?

2026-03-25 03:37:37 +00:00

CLAUDE.md

Remove ONNX pipeline and make PT2/FX the sole export path

2026-04-07 20:00:51 +00:00

LessonsLearned.md

Better scalar support: tests + 12 fixes (LUM-474) (#300 )

2026-05-13 20:16:30 -04:00

modal_pytest_runner.py

Reduce default profiling trials to 3 (#299 )

2026-05-06 13:04:57 -04:00

pyproject.toml

luminal_python + cuda_lite: unblock Qwen3-MoE compile path (#301 )

2026-05-11 12:34:52 -07:00

README.md

luminal_python + cuda_lite: unblock Qwen3-MoE compile path (#301 )

2026-05-11 12:34:52 -07:00

run_all_tests.sh

Dtype i64 f64 first class (#323 )

2026-05-23 00:35:46 -04:00

run_test.sh

Dtype i64 f64 first class (#323 )

2026-05-23 00:35:46 -04:00

run_tests_cuda.sh

luminal_python + cuda_lite: unblock Qwen3-MoE compile path (#301 )

2026-05-11 12:34:52 -07:00

README.md

luminal_python

PyTorch torch.compile integration for Luminal.

CUDA Tests

The Python CUDA CI job builds the Rust extension with the CUDA feature and runs the non-slow pytest suite:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m "not slow"

The slow tests are explicit opt-in. They include large/pretrained model tests, full-width architecture compiles, Whisper end-to-end cases, and other cases that can take a long time or need a large GPU / Hugging Face cache.

Run the full Python CUDA suite, including slow tests:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s

Run only the slow Python CUDA tests:

cd crates/luminal_python
RUST_BACKTRACE=1 \
LUMINAL_TEST_DEVICE=cuda \
MATURIN_PEP517_ARGS="--features cuda --profile release" \
CUDARC_CUDA_VERSION=12080 \
uv run --group dev python -m pytest tests/ -v -s -m slow

The helper script follows the same convention:

cd crates/luminal_python
./run_tests_cuda.sh              # non-slow CUDA suite
./run_tests_cuda.sh --slow-only  # only slow CUDA tests
./run_tests_cuda.sh --include-slow

The GitHub/Modal entrypoint uses the same marker split:

cd crates/luminal_python
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s -m "not slow"
modal run modal_pytest_runner.py --gpu A100 --timeout 7200 tests/ -v -s