Enforce full attempt logging and latest-eval checkpoint flow.

Require attempt-log updates on train.py commits, save per-run checkpoint as latest_eval_best_model, and let pre-commit promote improved runs to best_model while refreshing best trajectories. Also improved mean_rmsd_100 to 2.461592.

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-16 17:44:09 +09:00
parent 3ddae9d815
commit d125e7ca81
6 changed files with 31 additions and 12 deletions

View File

@@ -15,6 +15,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
## Repository policy
- Every attempt must update this README (append a short entry in `## Attempt Log`).
- Attempt log is mandatory for both successful and failed trials.
- Flow-matching training time must stay random (middle-time supervision is mandatory).
- Commits touching `train.py` must include:
- `reports/latest_eval.json`
@@ -33,6 +34,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- `GUIDELINES.md`: operating rules and workflow.
- `BEST_PRACTICE.json`: current best-known metric and config.
- `reports/latest_eval.json`: most recent measured metric.
- `artifacts/latest_eval_best_model.pt`: checkpoint from latest run that produced `latest_eval`.
- `artifacts/best_model.pt`: best checkpoint from latest improved run.
- `reports/trajectories/`: 6 regenerated trajectories from current best model.
- `scripts/precommit_performance_gate.py`: pre-commit guard for train-related commits.
@@ -50,3 +52,11 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- 2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to `mean_rmsd_100=2.519821` with `gcn hidden=512 layers=8 batch=96`.
- 2026-04-16: Added a general multi-layer diagnosis principle to `GUIDELINES.md` so experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization.
- 2026-04-16: Tried weighted objective to counter weak rotation/torsion motion (`w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8`) and improved to `mean_rmsd_100=2.505556`.
- 2026-04-16: Failed attempt B (longer, lower-lr weighted run) reached `mean_rmsd_100=2.531661`; reverted artifacts to current best.
- 2026-04-16: Failed attempt C (torsion-heavy weights, `time_power=1.2`) reached `mean_rmsd_100=2.564594`; no commit.
- 2026-04-16: Failed attempt D (deeper GCN config) reached `mean_rmsd_100=2.739573`; no commit.
- 2026-04-16: Failed attempt E (`w_center=0.75, w_omega=2.1, w_torsion=3.2, lr=9e-4`) reached `mean_rmsd_100=2.535795`; no commit.
- 2026-04-16: Failed attempt F (balanced weights `w_center=0.9, w_omega=1.8, w_torsion=2.6`) reached `mean_rmsd_100=2.522751`; no commit.
- 2026-04-16: Failed attempt G (`accum=3` for stability) reached `mean_rmsd_100=2.561071`; no commit.
- 2026-04-16: Policy update: every attempt (success/failure) must be logged; checkpoint flow changed to `artifacts/latest_eval_best_model.pt` per run, while pre-commit promotes improved runs to `artifacts/best_model.pt`.
- 2026-04-16: Improved attempt H (same weighted config, `seed=1`) reached `mean_rmsd_100=2.461592` (improved from `2.505556`).