chore: enforce train vs README split, ignore SDFs, drop tracked trajectories.

Add pre-commit guard against staging train.py with README.md, document the two-commit workflow, gitignore *.sdf, and remove trajectory SDFs from the index so logs stay small.

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-16 23:59:19 +09:00
parent 9e221c62a6
commit 8e4e38e851
5 changed files with 41 additions and 4 deletions

3
.gitignore vendored
View File

@@ -1,2 +1,5 @@
__pycache__/
*.pyc
# Structure outputs / trajectories (large; keep out of git history)
*.sdf

View File

@@ -1,6 +1,12 @@
repos:
- repo: local
hooks:
- id: no-train-readme-same-commit
name: forbid train.py + README.md in one commit
entry: python scripts/precommit_no_train_readme_mix.py
language: system
pass_filenames: false
stages: [pre-commit]
- id: train-performance-gate
name: train.py gate (flow all branches; RMSD gate main only)
entry: python scripts/precommit_performance_gate.py

View File

@@ -9,11 +9,12 @@ Make overfitting robust and measurable, targeting `mean_rmsd_100 <= 1.0`.
1. **Branch per line of work**: create a branch (e.g. `attempt/<topic>`) before changing `train.py` for a new experiment.
2. Modify code/config, run training, write `reports/latest_eval.json`.
3. Append one line to `README.md` attempt log for every attempt (success and failure).
4. **Commit each attempt on the branch** (include `train.py`, `reports/latest_eval.json`, and README log when you touched training). Feature-branch commits are not blocked by the mean-RMSD performance gate.
4. **Never commit `train.py` and `README.md` in the same git commit.** After a training run: (a) commit code/eval artifacts (`train.py`, `reports/latest_eval.json`, checkpoints, etc.) without `README.md`; then (b) make a **docs-only** commit that touches **only** `README.md` (attempt log line). Pre-commit enforces this split so `README.md` can be cherry-picked to `main` without dragging `train.py` along. Feature-branch commits are not blocked by the mean-RMSD performance gate.
5. When a branch is ready to land: **merge (or cherry-pick) into `main`**. The performance gate and `BEST_PRACTICE.json` / best-artifact refresh run only on **`main`** when `train.py` is part of the commit.
6. **`README.md` attempt log must also live on `main`**: if you only merged code later or abandoned a `train.py` merge, still **bring new `## Attempt Log` lines onto `main`** soon after (docs-only commit is fine—stage **only** `README.md` so the mean-RMSD gate does not run). Cherry-pick the README hunk from the branch or copy the lines; do not leave the canonical log only on a feature branch.
7. **Mandatory best-update integration**: if any feature-branch attempt records a strictly better `mean_rmsd_100` than the current `main` anchor, treat it as merge-ready work. Merge/cherry-pick it into `main` promptly (do not keep a known best only on a feature branch), then continue new experiments from a fresh branch off updated `main`.
8. **Per-attempt logging+commit is mandatory**: every experiment run must immediately (a) append its result to `README.md` and then (b) create a branch commit for that attempt before starting the next run. Do not batch multiple uncommitted runs.
8. **Per-attempt logging+commit is mandatory**: every experiment run must immediately append its result to `README.md`, then record it in git **before** the next run—but as **separate commits** from rule 4: code commit first, `README.md`-only commit second (same attempt, two commits minimum when both files change).
9. **SDF outputs stay out of git**: `*.sdf` is ignored; regenerate trajectories locally instead of committing structure files.
## Training budget and stopping

View File

@@ -16,7 +16,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- Every attempt must update this README (append a short entry in `## Attempt Log`).
- Attempt log is mandatory for both successful and failed trials.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** on that branch (typically include `train.py`, `reports/latest_eval.json`, and README log for that run). Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** as **two commits** when both change: (1) `train.py` plus eval artifacts (`reports/latest_eval.json`, checkpoints, …) **without** `README.md`; (2) a **docs-only** commit with **only** `README.md` (attempt log). Pre-commit blocks staging `train.py` and `README.md` together. Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json` and best artifacts). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
- **`## Attempt Log` on `main`**: new log lines written on a feature branch must be **replicated on `main`** (docs-only `README.md` commit if `train.py` is not landing yet). See `GUIDELINES.md` workflow step 6.
- Flow-matching training time must stay random (middle-time supervision is mandatory).
@@ -36,7 +36,7 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- `reports/latest_eval.json`: most recent measured metric.
- `artifacts/latest_eval_best_model.pt`: checkpoint from latest run that produced `latest_eval`.
- `artifacts/best_model.pt`: best checkpoint from latest improved run.
- `reports/trajectories/`: 6 regenerated trajectories from current best model.
- `reports/trajectories/`: trajectory SDFs are **gitignored** (`*.sdf`); regenerate locally after training when needed.
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and best-artifact refresh only on `main`**.
## Attempt Log

View File

@@ -0,0 +1,27 @@
#!/usr/bin/env python3
"""Reject commits that stage train.py and README.md together (cherry-pick hygiene)."""
from __future__ import annotations
import subprocess
import sys
def main() -> int:
out = subprocess.check_output(
["git", "diff", "--cached", "--name-only"],
text=True,
)
names = {line.strip() for line in out.splitlines() if line.strip()}
if "train.py" in names and "README.md" in names:
print(
"pre-commit: do not commit train.py and README.md in the same commit.\n"
"Use two commits: (1) train.py plus any eval artifacts you intend to land; "
"(2) README.md attempt-log line only (docs-only commit for easy cherry-pick to main).",
file=sys.stderr,
)
return 1
return 0
if __name__ == "__main__":
raise SystemExit(main())