update README to reflect current state

- drop stale jafioti issue link - replace "search partially merged" with current default flow - remove false autograd claim - update examples list (gemma/qwen/moe/paged_llama) - point high-level ops at src/frontend/ instead of hl_ops - add PyTorch torch.compile getting-started block Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:49:47 +09:00 · 2026-04-20 21:27:27 +00:00
1 changed files with 17 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -45,6 +45,18 @@ cd ./examples/llama
 cargo run --release
 ```

+**PyTorch models via `torch.compile`**
+
+Any PyTorch model can be run through Luminal by swapping the backend:
+```python
+import torch
+from luminal import luminal_backend
+
+model_compiled = torch.compile(model, backend=luminal_backend)
+output = model_compiled(x)
+```
+See `crates/luminal_python/` for the PT2-based bridge.
+
 ## Features

 ### Speed
@@ -75,7 +87,7 @@ The current ML ecosystem is too fragmented, and the solution isn't another layer

 ### Validated against Pytorch

-Correctness matters. We write as much tests as possible to cover all ops and verify they work the same as an equivalent Pytorch implementation. ([Improvements needed!](https://github.com/jafioti/luminal/issues/20))
+Correctness matters. We write as much tests as possible to cover all ops and verify they work the same as an equivalent Pytorch implementation.

 ## Ideology

@@ -102,12 +114,12 @@ Now we can do:

 ## Where are we?

- Search is partially merged. We are between 1.0 and 2.0 (search), which will be completed within the next month or so.
+- Search is the default execution path — compile via `build_search_space` and `search` (see the Usage example above).
 - Metal and Cuda are supported for running models on Macs and Nvidia GPUs respectively, in both full and half precision.
- Full training support with graph-based autograd.
- Llama 3, Phi 3, Whisper and Yolo v8 are implemented in `examples/`. See instructions above for running.
+- Llama 3, Gemma, Qwen (incl. MoE variants), and a paged-attention Llama are implemented in `examples/`. See instructions above for running.
 - We have a small library of NN modules in `luminal_nn`, including transformers.
- A significant amount of high-level ops are implemented in `hl_ops`. We are aiming to match the most used ~80% of the pytorch api.
+- A large surface of high-level ops lives in `src/frontend/` — aiming to match the most used ~80% of the PyTorch api.
+- PyTorch models can be run through luminal via `torch.compile` — see `crates/luminal_python/`.

 Some things on the roadmap: