Default Branch

75e4e6be0a · Simplify example mains and trim CUDA profiling output (#339) · Updated 2026-05-30 12:37:13 +09:00

Branches

62e86f9dc5 · Reuse cuBLASLt prepares across matching graph ops · Updated 2026-06-01 09:25:30 +09:00

0
1

9677428521 · cuda_lite + PT2 translator: bf16 correctness fixes surfaced by gemma4 · Updated 2026-06-01 08:58:48 +09:00

0
1

a45629cece · cublaslt: gate bf16/f16 cast-F32 fusion on ?beta = 0.0 · Updated 2026-05-30 08:07:57 +09:00

1
1

caa036dca8 · upgraded to cuda 13.3 · Updated 2026-05-28 07:37:46 +09:00

1
1

d000452167 · graph: release search-time e-graphs + bump profile_timeout default to 5s · Updated 2026-05-28 05:38:01 +09:00

1
2

02c58c4e9f · Fail loudly on unfulfilled writeback contracts · Updated 2026-05-23 03:50:59 +09:00

14
12

55867e9830 · wip · Updated 2026-05-22 08:51:04 +09:00

27
2

f3907e0ddd · Skip per-prompt KV cache clears in rust examples · Updated 2026-05-22 02:19:26 +09:00

5
4

c41ede0e5b · luminal_python: vanilla-PyTorch DLRMv1 fast paths + batched FFI + input cache · Updated 2026-05-22 01:18:31 +09:00

5
1

bfcc41040e · dlrm --mega: route the megakernel through luminal's runtime, not around it · Updated 2026-05-22 00:18:59 +09:00

7
8

6a0c6321f4 · luminal_python: DLRM vanilla-PyTorch fast paths + DCE + batched FFI · Updated 2026-05-21 09:29:37 +09:00

11
1

486eaf7255 · DLRM: PairwiseDot v2, StackedEmb block-bundle, per-kernel timing infra · Updated 2026-05-20 08:16:21 +09:00

14
5

68f45644e7 · luminal_python: in-graph aten.copy_.default for mutation writeback · Updated 2026-05-20 02:04:54 +09:00

17
33

9311c59b4c · Extract example_common crate to dedup rust example boilerplate · Updated 2026-05-19 07:23:18 +09:00

15
2

0305ed9c39 · Add rsqrt CUDA rewrite and square pow specialization · Updated 2026-05-19 03:57:49 +09:00

15
1

a4bda06d64 · tests for interface specification · Updated 2026-05-15 06:24:56 +09:00

22
1

0b1e09cf23 · Merge remote-tracking branch 'origin/main' into codex/rust-stdio-benchmark · Updated 2026-05-15 02:01:19 +09:00

22
2

d6b0eb0ec1 · Add recommender model compile coverage · Updated 2026-05-14 06:40:15 +09:00

24
1

f85995c2a2 · Use parallel launch for embed kernel · Updated 2026-05-14 06:24:54 +09:00

24
2

818608ad36 · Fix PT2 passthrough input output ID collision · Updated 2026-05-13 05:13:18 +09:00

29
25