luminal_python: WIP workaround for dynamo "L not defined" on gemma3

Set torch._dynamo.config.automatic_dynamic_shapes = False at package
import time. With the default (True), dynamo's frame-evaluation cache
promotes a varying dim to dynamic on the second compiled call and
emits a `_guards_fn` submodule whose source closes over `L` (the
dynamo locals namespace). When our backend re-exports the FX graph,
the closure's free `L` reference doesn't resolve and we panic with
  NameError: name 'L' is not defined
during aot_export_joint_with_descriptors.

gemma3-4b's StaticCache call pattern triggers it deterministically
(every search budget, every iter); llama-8b, qwen3-4b, qwen3-moe on
the same backend do not. Disabling automatic_dynamic_shapes forces
a fresh-static-trace recompile on each shape mismatch instead of the
L-referencing dynamic-shape path.

Cost / why this is WIP, not a fix:
The bench loop calls compiled() with cache_position=[1], [2], [3]…
each iter. The shape is constant ([1]) but the value varies. With
automatic_dynamic_shapes=False, dynamo recompiles per cache_position
*value* — i.e. one full luminal compile per token in the prompt.
A search-iters=1 gemma3 smoke takes ~2 hr CPU and pegs at 200 GB
host RSS instead of a clean ~30 s. Functional but not shippable as
the steady-state path.

Better long-term routes (not in this commit):
- mark cache_position as a static address / specialise it at trace
  time so dynamo doesn't see value variation.
- handle the L-referencing guards module in pt2.py (inject the
  expected namespace before aot_export, or strip the guards submodule
  when re-exporting).
- reuse the SymInt specialisation already in pt2.py (previous commit)
  and keep automatic_dynamic_shapes=True so the dim becomes a clean
  symbolic that pt2.py can resolve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Tucker
2026-05-01 22:15:17 +00:00
parent d21f55ed78
commit 3a3cd04958

View File

@@ -1,5 +1,7 @@
"""Luminal Python bindings - PyTorch backend using Luminal."""
import torch._dynamo
# Import Python components
# Register DynamicCache pytree serialization once at import time
from .cache_utils import _register_cache_serialization
@@ -11,6 +13,17 @@ from .main import luminal_backend, register_backend
_register_cache_serialization()
# Disable dynamo's automatic-dynamic-shape promotion. On the second compiled
# call dynamo otherwise promotes any dim that varied to dynamic and emits a
# `_guards_fn` submodule that closes over `L` (the dynamo locals namespace).
# When our backend re-exports the FX graph via `torch.export`, that closure's
# free `L` reference doesn't resolve and we get
# `NameError: name 'L' is not defined` during aot_export_joint_with_descriptors.
# Gemma3's StaticCache call pattern triggers it deterministically; llama / qwen
# don't. Forcing recompile-on-shape-change keeps every call on a static graph
# the backend can actually translate.
torch._dynamo.config.automatic_dynamic_shapes = False
# Re-export everything for clean package interface
__all__ = [
"CompiledModel",