Add pre-commit guard against staging train.py with README.md, document the two-commit workflow, gitignore *.sdf, and remove trajectory SDFs from the index so logs stay small. Made-with: Cursor
6.5 KiB
6.5 KiB
GUIDELINES
Purpose
Make overfitting robust and measurable, targeting mean_rmsd_100 <= 1.0.
Workflow
- Branch per line of work: create a branch (e.g.
attempt/<topic>) before changingtrain.pyfor a new experiment. - Modify code/config, run training, write
reports/latest_eval.json. - Append one line to
README.mdattempt log for every attempt (success and failure). - Never commit
train.pyandREADME.mdin the same git commit. After a training run: (a) commit code/eval artifacts (train.py,reports/latest_eval.json, checkpoints, etc.) withoutREADME.md; then (b) make a docs-only commit that touches onlyREADME.md(attempt log line). Pre-commit enforces this split soREADME.mdcan be cherry-picked tomainwithout draggingtrain.pyalong. Feature-branch commits are not blocked by the mean-RMSD performance gate. - When a branch is ready to land: merge (or cherry-pick) into
main. The performance gate andBEST_PRACTICE.json/ best-artifact refresh run only onmainwhentrain.pyis part of the commit. README.mdattempt log must also live onmain: if you only merged code later or abandoned atrain.pymerge, still bring new## Attempt Loglines ontomainsoon after (docs-only commit is fine—stage onlyREADME.mdso the mean-RMSD gate does not run). Cherry-pick the README hunk from the branch or copy the lines; do not leave the canonical log only on a feature branch.- Mandatory best-update integration: if any feature-branch attempt records a strictly better
mean_rmsd_100than the currentmainanchor, treat it as merge-ready work. Merge/cherry-pick it intomainpromptly (do not keep a known best only on a feature branch), then continue new experiments from a fresh branch off updatedmain. - Per-attempt logging+commit is mandatory: every experiment run must immediately append its result to
README.md, then record it in git before the next run—but as separate commits from rule 4: code commit first,README.md-only commit second (same attempt, two commits minimum when both files change). - SDF outputs stay out of git:
*.sdfis ignored; regenerate trajectories locally instead of committing structure files.
Training budget and stopping
- Do not shrink the epoch budget by default while the learning curve is still improving.
- If wall-clock is tight, use explicit early stopping on training-side signals only (e.g. plateauing training loss), with a large max-epoch cap and patience.
- Do not use the final gated metric (
mean_rmsd_100) or any equivalent “mini-test” of the same objective during training to decide when to stop. That peeks at the evaluation target and is leakage / cheating in this single-sample overfit setting. - Do not introduce a held-out RMSD split for stopping; the reported metric is the quality gate, not a training control signal.
- Record in the attempt note how training ended (e.g.
max_epochs,early_stopon train loss only).
Rollback and re-integration (not “nuke everything”)
- Anchor:
BEST_PRACTICE.jsonplus the lastmaincommit that passed the merge-time gate define the production story. Feature branches are scratch space. - Prefer selective undo: when attempts pile up, re-read the branch history commit-by-commit, decide what actually helped, and drop only what is useless (revert single commits,
git restorespecific paths, or reset a branch tip while keeping good commits reachable). - Cherry-pick integration: to land work on
mainwithout merging a messy branch wholesale, create a fresh branch frommainand cherry-pick only the commits you still believe in; resolve conflicts; run eval; merge tomainwhen the gate passes. - Log honestly: append a short note when you abandon a direction (what was dropped and why), without erasing earlier attempt log lines.
What Counts As An Independent Attempt
- Independent attempts must change a research-level concept, such as:
- model architecture/backbone/head design;
- objective/loss formulation;
- training strategy (curriculum, teacher forcing style, optimization regime);
- representation or rollout/evaluation coupling logic.
- Pure hyperparameter sweeps (LR, batch size, seed, minor weight nudges) are not treated as standalone attempts.
- Hyperparameter changes are allowed only as supporting details within a larger conceptual change.
Micro-tuning cap per strategy
- For each new strategy (new research-level concept), micro-tuning is capped at 5 runs.
- Micro-tuning includes LR/seed/batch/clip/time-power/weight nudges that do not change the core concept.
- After 5 micro-tuning runs for that strategy, stop tuning it and either:
- promote the best result from that strategy, or
- declare the strategy exhausted in
README.mdand move to a new independent strategy.
- Do not reset this counter by branching or renaming; count is per strategy idea.
Non-negotiable flow-matching rule
- Time conditioning in training must be random every sample (middle-time flow supervision).
- Do not replace training time with fixed constants.
Required report format
reports/latest_eval.json must include:
mean_rmsd_100(float, lower is better)num_runs(int, must be 100)timestamp_utccommandnotes
Repro notes
- Keep seed explicit in commands.
- Keep sample path explicit.
- Prefer additive experiments (do not silently remove prior working options).
Multi-layer diagnosis mindset
- Do not optimize only a scalar metric; analyze behavior from multiple views each attempt.
- Use trajectory inspection as one analysis axis, not a fixed prescription.
- Combine at least two kinds of evidence when judging a strategy:
- quantitative metrics (RMSD, train/eval gap, stability);
- qualitative dynamics (trajectory patterns, mode collapse, unrealistic motion);
- structural diagnostics (e.g., internal-distance change, geometry consistency).
- Treat metric improvement without believable dynamics (or vice versa) as incomplete progress.
- Example signal: if motion appears translation-dominant with weak internal change, investigate rotation/torsion learning capacity and loss balance.
Safety
- On
main, if pre-commit blocks atrain.pychange due to no RMSD improvement, either improve the model and re-evaluate, or keep iterating on a feature branch and merge/cherry-pick only when ready. - On feature branches, you may commit freely without the mean-RMSD gate; the flow-matching token check still runs whenever
train.pyis staged (same as onmain).