luminal_python: WIP workaround for dynamo "L not defined" on gemma3

Set torch._dynamo.config.automatic_dynamic_shapes = False at package import time. With the default (True), dynamo's frame-evaluation cache promotes a varying dim to dynamic on the second compiled call and emits a `_guards_fn` submodule whose source closes over `L` (the dynamo locals namespace). When our backend re-exports the FX graph, the closure's free `L` reference doesn't resolve and we panic with NameError: name 'L' is not defined during aot_export_joint_with_descriptors. gemma3-4b's StaticCache call pattern triggers it deterministically (every search budget, every iter); llama-8b, qwen3-4b, qwen3-moe on the same backend do not. Disabling automatic_dynamic_shapes forces a fresh-static-trace recompile on each shape mismatch instead of the L-referencing dynamic-shape path. Cost / why this is WIP, not a fix: The bench loop calls compiled() with cache_position=[1], [2], [3]… each iter. The shape is constant ([1]) but the value varies. With automatic_dynamic_shapes=False, dynamo recompiles per cache_position *value* — i.e. one full luminal compile per token in the prompt. A search-iters=1 gemma3 smoke takes ~2 hr CPU and pegs at 200 GB host RSS instead of a clean ~30 s. Functional but not shippable as the steady-state path. Better long-term routes (not in this commit): - mark cache_position as a static address / specialise it at trace time so dynamo doesn't see value variation. - handle the L-referencing guards module in pt2.py (inject the expected namespace before aot_export, or strip the guards submodule when re-exporting). - reuse the SymInt specialisation already in pt2.py (previous commit) and keep automatic_dynamic_shapes=True so the dim becomes a clean symbolic that pt2.py can resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:49:47 +09:00 · 2026-05-01 22:15:17 +00:00
parent d21f55ed78
commit 3a3cd04958
1 changed files with 13 additions and 0 deletions
--- a/crates/luminal_python/src/luminal/init.py
+++ b/crates/luminal_python/src/luminal/init.py
@@ -1,5 +1,7 @@
 """Luminal Python bindings - PyTorch backend using Luminal."""

+import torch._dynamo
+
 # Import Python components
 # Register DynamicCache pytree serialization once at import time
 from .cache_utils import _register_cache_serialization
@@ -11,6 +13,17 @@ from .main import luminal_backend, register_backend

 _register_cache_serialization()

+# Disable dynamo's automatic-dynamic-shape promotion. On the second compiled
+# call dynamo otherwise promotes any dim that varied to dynamic and emits a
+# `_guards_fn` submodule that closes over `L` (the dynamo locals namespace).
+# When our backend re-exports the FX graph via `torch.export`, that closure's
+# free `L` reference doesn't resolve and we get
+#   `NameError: name 'L' is not defined` during aot_export_joint_with_descriptors.
+# Gemma3's StaticCache call pattern triggers it deterministically; llama / qwen
+# don't. Forcing recompile-on-shape-change keeps every call on a static graph
+# the backend can actually translate.
+torch._dynamo.config.automatic_dynamic_shapes = False
+
 # Re-export everything for clean package interface
 __all__ = [
    "CompiledModel",