chore: enforce train vs README split, ignore SDFs, drop tracked trajectories.
Add pre-commit guard against staging train.py with README.md, document the two-commit workflow, gitignore *.sdf, and remove trajectory SDFs from the index so logs stay small. Made-with: Cursor
This commit is contained in:
3
.gitignore
vendored
3
.gitignore
vendored
@@ -1,2 +1,5 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
|
||||
# Structure outputs / trajectories (large; keep out of git history)
|
||||
*.sdf
|
||||
|
||||
@@ -1,6 +1,12 @@
|
||||
repos:
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: no-train-readme-same-commit
|
||||
name: forbid train.py + README.md in one commit
|
||||
entry: python scripts/precommit_no_train_readme_mix.py
|
||||
language: system
|
||||
pass_filenames: false
|
||||
stages: [pre-commit]
|
||||
- id: train-performance-gate
|
||||
name: train.py gate (flow all branches; RMSD gate main only)
|
||||
entry: python scripts/precommit_performance_gate.py
|
||||
|
||||
@@ -9,11 +9,12 @@ Make overfitting robust and measurable, targeting `mean_rmsd_100 <= 1.0`.
|
||||
1. **Branch per line of work**: create a branch (e.g. `attempt/<topic>`) before changing `train.py` for a new experiment.
|
||||
2. Modify code/config, run training, write `reports/latest_eval.json`.
|
||||
3. Append one line to `README.md` attempt log for every attempt (success and failure).
|
||||
4. **Commit each attempt on the branch** (include `train.py`, `reports/latest_eval.json`, and README log when you touched training). Feature-branch commits are not blocked by the mean-RMSD performance gate.
|
||||
4. **Never commit `train.py` and `README.md` in the same git commit.** After a training run: (a) commit code/eval artifacts (`train.py`, `reports/latest_eval.json`, checkpoints, etc.) without `README.md`; then (b) make a **docs-only** commit that touches **only** `README.md` (attempt log line). Pre-commit enforces this split so `README.md` can be cherry-picked to `main` without dragging `train.py` along. Feature-branch commits are not blocked by the mean-RMSD performance gate.
|
||||
5. When a branch is ready to land: **merge (or cherry-pick) into `main`**. The performance gate and `BEST_PRACTICE.json` / best-artifact refresh run only on **`main`** when `train.py` is part of the commit.
|
||||
6. **`README.md` attempt log must also live on `main`**: if you only merged code later or abandoned a `train.py` merge, still **bring new `## Attempt Log` lines onto `main`** soon after (docs-only commit is fine—stage **only** `README.md` so the mean-RMSD gate does not run). Cherry-pick the README hunk from the branch or copy the lines; do not leave the canonical log only on a feature branch.
|
||||
7. **Mandatory best-update integration**: if any feature-branch attempt records a strictly better `mean_rmsd_100` than the current `main` anchor, treat it as merge-ready work. Merge/cherry-pick it into `main` promptly (do not keep a known best only on a feature branch), then continue new experiments from a fresh branch off updated `main`.
|
||||
8. **Per-attempt logging+commit is mandatory**: every experiment run must immediately (a) append its result to `README.md` and then (b) create a branch commit for that attempt before starting the next run. Do not batch multiple uncommitted runs.
|
||||
8. **Per-attempt logging+commit is mandatory**: every experiment run must immediately append its result to `README.md`, then record it in git **before** the next run—but as **separate commits** from rule 4: code commit first, `README.md`-only commit second (same attempt, two commits minimum when both files change).
|
||||
9. **SDF outputs stay out of git**: `*.sdf` is ignored; regenerate trajectories locally instead of committing structure files.
|
||||
|
||||
## Training budget and stopping
|
||||
|
||||
|
||||
@@ -16,7 +16,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
|
||||
|
||||
- Every attempt must update this README (append a short entry in `## Attempt Log`).
|
||||
- Attempt log is mandatory for both successful and failed trials.
|
||||
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** on that branch (typically include `train.py`, `reports/latest_eval.json`, and README log for that run). Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
|
||||
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** as **two commits** when both change: (1) `train.py` plus eval artifacts (`reports/latest_eval.json`, checkpoints, …) **without** `README.md`; (2) a **docs-only** commit with **only** `README.md` (attempt log). Pre-commit blocks staging `train.py` and `README.md` together. Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
|
||||
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json` and best artifacts). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
|
||||
- **`## Attempt Log` on `main`**: new log lines written on a feature branch must be **replicated on `main`** (docs-only `README.md` commit if `train.py` is not landing yet). See `GUIDELINES.md` workflow step 6.
|
||||
- Flow-matching training time must stay random (middle-time supervision is mandatory).
|
||||
@@ -36,7 +36,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
|
||||
- `reports/latest_eval.json`: most recent measured metric.
|
||||
- `artifacts/latest_eval_best_model.pt`: checkpoint from latest run that produced `latest_eval`.
|
||||
- `artifacts/best_model.pt`: best checkpoint from latest improved run.
|
||||
- `reports/trajectories/`: 6 regenerated trajectories from current best model.
|
||||
- `reports/trajectories/`: trajectory SDFs are **gitignored** (`*.sdf`); regenerate locally after training when needed.
|
||||
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and best-artifact refresh only on `main`**.
|
||||
|
||||
## Attempt Log
|
||||
|
||||
27
scripts/precommit_no_train_readme_mix.py
Normal file
27
scripts/precommit_no_train_readme_mix.py
Normal file
@@ -0,0 +1,27 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Reject commits that stage train.py and README.md together (cherry-pick hygiene)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
|
||||
def main() -> int:
|
||||
out = subprocess.check_output(
|
||||
["git", "diff", "--cached", "--name-only"],
|
||||
text=True,
|
||||
)
|
||||
names = {line.strip() for line in out.splitlines() if line.strip()}
|
||||
if "train.py" in names and "README.md" in names:
|
||||
print(
|
||||
"pre-commit: do not commit train.py and README.md in the same commit.\n"
|
||||
"Use two commits: (1) train.py plus any eval artifacts you intend to land; "
|
||||
"(2) README.md attempt-log line only (docs-only commit for easy cherry-pick to main).",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user