Harden geodesic training against non-finite failures.

Add NaN/Inf guards for geodesic loss and RMSD evaluation, document best-update merge policy in guidelines, and record stabilization attempt notes before mandatory integration to main.

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-16 22:36:57 +09:00
parent 3f04a380d8
commit 12b2fac462
4 changed files with 20 additions and 6 deletions

View File

@@ -81,3 +81,6 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- 2026-04-16: Attempt Z (geodesic stable rerun): same setup as V (`--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1`) improved further to `mean_rmsd_100=2.426296` (current best in this branch, better than anchor `2.461592`).
- 2026-04-16: Added train-loss-only early stopping controls (`--early-stop-patience`, `--early-stop-min-delta`, `--early-stop-check-every`, `--early-stop-warmup`) with `stop_reason`/`stop_epoch` reporting in logs and `reports/latest_eval.json`; objective-metric stopping remains disabled.
- 2026-04-16: Attempt AA (merge prep rerun on CUDA): repeated geodesic best-practice config (`--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1`) and measured `mean_rmsd_100=2.429895` (`num_runs=100`), still improving over main anchor `2.461592`.
- 2026-04-16: Branch `attempt/geodesic-stability-next`: stress-tested geodesic+residual variants; best observed metric reached `mean_rmsd_100=2.388103` (`--rotation-loss geodesic --gcn-residual --epochs=280 --batch-size=24 --lr=7e-4 --seed=1`), with occasional NaN instability in nearby runs.
- 2026-04-16: Stabilization-only update: added non-finite guards/clamps in geodesic loss, Kabsch RMSD, and training loss fallback to reduce NaN-caused crashes during long geodesic sweeps.
- 2026-04-16: Policy update in `GUIDELINES.md`: when a branch obtains a strict best `mean_rmsd_100`, integration into `main` is mandatory before continuing new branch experiments.