docs: log bond_pair stabilization smoke and full eval.

Record stabilization knobs and mean_rmsd_100=2.606118 for the 320-epoch calibration run.

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-17 00:35:12 +09:00
parent ff0149f02c
commit dac3a163ae

View File

@@ -100,3 +100,4 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- 2026-04-16: Strategy S4 micro-tuning #4 increased tail penalty (`tail-risk-weight=0.25`, quantile `0.9`) and regressed sharply to `mean_rmsd_100=2.601258`; indicates over-penalization risk.
- 2026-04-16: Strategy S4 micro-tuning #5 changed seed (`seed=2`, `tail-risk-weight=0.2`, quantile `0.9`) and encountered prolonged fallback-to-1000 behavior with `mean_rmsd_100=2.709563`; S4 hit 5-run cap with no best update.
- 2026-04-16: Structural torsion head (`--torsion-head bond_pair`, GCN only): translation/rotation still use full-graph mean-pooled trunk+time; each torsion `k` runs the **same GCN weights** on the **movable-side induced subgraph** (mask only selects nodes/edges for that subgraph—mask values are not fed as features), mean-pools that subgraph, concatenates with global pooled context, `LayerNorm`, then a small MLP to one scalar. Replaced the prior mask-as-feature design. One calibration run (`epochs=320`, geodesic+residual) reached `mean_rmsd_100=2.598530` with long `train_mse=1000` plateaus; worse than best `2.388103`, likely dominated by multi-forward cost + same geodesic instability rather than readout alone.
- 2026-04-16: bond_pair stabilization pass: subgraph batch `add_self_loops`, post-pool `LayerNorm` on subgraph embedding, small Xavier init on torsion MLP, `torch.nan_to_num` + optional output clamp (`--subgraph-torsion-clip`), and Adam param-group with `--subgraph-lr-scale` (default `0.3`) for `sub_convs`/torsion head vs main `--lr`. Smoke (48ep) avoided `1000` train spikes; full run (`epochs=320`, `lr=5.5e-4`, `subgraph_lr_scale=0.25`, `clip=6`, `eval-runs=100`) reached `mean_rmsd_100=2.606118` (still above best `2.388103`) but training telemetry stayed below the non-finite fallback wall in logged epochs.