Compare commits

...

1 Commits

Author SHA1 Message Date
Tucker Morgan
1dfd0804a8 update README to reflect current state
- drop stale jafioti issue link
- replace "search partially merged" with current default flow
- remove false autograd claim
- update examples list (gemma/qwen/moe/paged_llama)
- point high-level ops at src/frontend/ instead of hl_ops
- add PyTorch torch.compile getting-started block

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:27:27 +00:00

View File

@@ -45,6 +45,18 @@ cd ./examples/llama
cargo run --release
```
**PyTorch models via `torch.compile`**
Any PyTorch model can be run through Luminal by swapping the backend:
```python
import torch
from luminal import luminal_backend
model_compiled = torch.compile(model, backend=luminal_backend)
output = model_compiled(x)
```
See `crates/luminal_python/` for the PT2-based bridge.
## Features
### Speed
@@ -75,7 +87,7 @@ The current ML ecosystem is too fragmented, and the solution isn't another layer
### Validated against Pytorch
Correctness matters. We write as much tests as possible to cover all ops and verify they work the same as an equivalent Pytorch implementation. ([Improvements needed!](https://github.com/jafioti/luminal/issues/20))
Correctness matters. We write as much tests as possible to cover all ops and verify they work the same as an equivalent Pytorch implementation.
## Ideology
@@ -102,12 +114,12 @@ Now we can do:
## Where are we?
- Search is partially merged. We are between 1.0 and 2.0 (search), which will be completed within the next month or so.
- Search is the default execution path — compile via `build_search_space` and `search` (see the Usage example above).
- Metal and Cuda are supported for running models on Macs and Nvidia GPUs respectively, in both full and half precision.
- Full training support with graph-based autograd.
- Llama 3, Phi 3, Whisper and Yolo v8 are implemented in `examples/`. See instructions above for running.
- Llama 3, Gemma, Qwen (incl. MoE variants), and a paged-attention Llama are implemented in `examples/`. See instructions above for running.
- We have a small library of NN modules in `luminal_nn`, including transformers.
- A significant amount of high-level ops are implemented in `hl_ops`. We are aiming to match the most used ~80% of the pytorch api.
- A large surface of high-level ops lives in `src/frontend/` aiming to match the most used ~80% of the PyTorch api.
- PyTorch models can be run through luminal via `torch.compile` — see `crates/luminal_python/`.
Some things on the roadmap: