Gate mean-RMSD checks on main only; document branch workflow.

Run flow-matching constraints whenever train.py is staged; apply strict mean_rmsd_100 and best-artifact updates only on the main branch. Update GUIDELINES and README for branch-per-attempt commits and cherry-pick re-integration.

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-16 18:10:47 +09:00
parent ba1c1a3892
commit a029801e00
4 changed files with 43 additions and 23 deletions

View File

@@ -16,14 +16,11 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- Every attempt must update this README (append a short entry in `## Attempt Log`).
- Attempt log is mandatory for both successful and failed trials.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** on that branch (typically include `train.py`, `reports/latest_eval.json`, and README log for that run). Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json` and best artifacts). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
- Flow-matching training time must stay random (middle-time supervision is mandatory).
- Independent attempts must be research-level changes (architecture/training strategy/loss design). Pure hyperparameter-only runs are not counted as standalone attempts.
- When failures accumulate without beating `BEST_PRACTICE.json`, follow `GUIDELINES.md` rollback rules (partial or full reset to anchor); log rollback in the attempt log; do not use `mean_rmsd_100` (or equivalent) as a training-time early-stopping signal.
- Commits touching `train.py` must include:
- `reports/latest_eval.json`
- strictly better `mean_rmsd_100` compared to previous best (enforced by pre-commit).
- `BEST_PRACTICE.json` is auto-updated and staged by pre-commit.
- best checkpoint and trajectory artifacts are auto-regenerated by pre-commit.
- When failures accumulate, **re-evaluate branch commits** and integrate with **cherry-pick** (or selective revert / path restore)—not wholesale rollback unless explicitly justified. Do not use `mean_rmsd_100` (or equivalent) as a training-time early-stopping signal.
## Evaluation target
@@ -39,7 +36,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- `artifacts/latest_eval_best_model.pt`: checkpoint from latest run that produced `latest_eval`.
- `artifacts/best_model.pt`: best checkpoint from latest improved run.
- `reports/trajectories/`: 6 regenerated trajectories from current best model.
- `scripts/precommit_performance_gate.py`: pre-commit guard for train-related commits.
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and best-artifact refresh only on `main`**.
## Attempt Log
@@ -69,3 +66,4 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- 2026-04-16: Failed attempt M (research-level: decoupled centered-coordinate architecture only, no terminal auxiliary term) reached `mean_rmsd_100=2.479326`; close to best but no commit.
- 2026-04-16: Failed attempt N (training-strategy: added configurable early stopping with large max-epoch budget, patience/min-delta/check cadence controls) ran to max epoch with ongoing improvements (`stop_reason=max_epochs`) and reached `mean_rmsd_100=2.764940`; no commit.
- 2026-04-16: Rollback (per `GUIDELINES.md`): restored `train.py`, `reports/latest_eval.json`, and `artifacts/latest_eval_best_model.pt` to last committed baseline after attempts KN; `mean_rmsd_100` anchor unchanged at `2.461592` (`BEST_PRACTICE.json`). Objective-aligned early stopping remains disallowed for training control.
- 2026-04-16: Policy update: experiments run on **feature branches** with **one commit per attempt**; mean-RMSD pre-commit gate applies only on **`main`** (merge/cherry-pick integration). Re-triage failed stacks via **cherry-pick** / selective drops, not default full-tree rollback.