Files

demian3b 8e4e38e851 chore: enforce train vs README split, ignore SDFs, drop tracked trajectories.

Add pre-commit guard against staging train.py with README.md, document the two-commit workflow, gitignore *.sdf, and remove trajectory SDFs from the index so logs stay small.

Made-with: Cursor

2026-04-16 23:59:19 +09:00

6.5 KiB

Raw Blame History

GUIDELINES

Purpose

Make overfitting robust and measurable, targeting mean_rmsd_100 <= 1.0.

Workflow

Branch per line of work: create a branch (e.g. attempt/<topic>) before changing train.py for a new experiment.
Modify code/config, run training, write reports/latest_eval.json.
Append one line to README.md attempt log for every attempt (success and failure).
Never commit train.py and README.md in the same git commit. After a training run: (a) commit code/eval artifacts (train.py, reports/latest_eval.json, checkpoints, etc.) without README.md; then (b) make a docs-only commit that touches only README.md (attempt log line). Pre-commit enforces this split so README.md can be cherry-picked to main without dragging train.py along. Feature-branch commits are not blocked by the mean-RMSD performance gate.
When a branch is ready to land: merge (or cherry-pick) into main. The performance gate and BEST_PRACTICE.json / best-artifact refresh run only on main when train.py is part of the commit.
README.md attempt log must also live on main: if you only merged code later or abandoned a train.py merge, still bring new ## Attempt Log lines onto main soon after (docs-only commit is fine—stage only README.md so the mean-RMSD gate does not run). Cherry-pick the README hunk from the branch or copy the lines; do not leave the canonical log only on a feature branch.
Mandatory best-update integration: if any feature-branch attempt records a strictly better mean_rmsd_100 than the current main anchor, treat it as merge-ready work. Merge/cherry-pick it into main promptly (do not keep a known best only on a feature branch), then continue new experiments from a fresh branch off updated main.
Per-attempt logging+commit is mandatory: every experiment run must immediately append its result to README.md, then record it in git before the next run—but as separate commits from rule 4: code commit first, README.md-only commit second (same attempt, two commits minimum when both files change).
SDF outputs stay out of git: *.sdf is ignored; regenerate trajectories locally instead of committing structure files.

Training budget and stopping

Do not shrink the epoch budget by default while the learning curve is still improving.
If wall-clock is tight, use explicit early stopping on training-side signals only (e.g. plateauing training loss), with a large max-epoch cap and patience.
Do not use the final gated metric (mean_rmsd_100) or any equivalent “mini-test” of the same objective during training to decide when to stop. That peeks at the evaluation target and is leakage / cheating in this single-sample overfit setting.
Do not introduce a held-out RMSD split for stopping; the reported metric is the quality gate, not a training control signal.
Record in the attempt note how training ended (e.g. max_epochs, early_stop on train loss only).

Rollback and re-integration (not “nuke everything”)

Anchor: BEST_PRACTICE.json plus the last main commit that passed the merge-time gate define the production story. Feature branches are scratch space.
Prefer selective undo: when attempts pile up, re-read the branch history commit-by-commit, decide what actually helped, and drop only what is useless (revert single commits, git restore specific paths, or reset a branch tip while keeping good commits reachable).
Cherry-pick integration: to land work on main without merging a messy branch wholesale, create a fresh branch from main and cherry-pick only the commits you still believe in; resolve conflicts; run eval; merge to main when the gate passes.
Log honestly: append a short note when you abandon a direction (what was dropped and why), without erasing earlier attempt log lines.

What Counts As An Independent Attempt

Independent attempts must change a research-level concept, such as:
- model architecture/backbone/head design;
- objective/loss formulation;
- training strategy (curriculum, teacher forcing style, optimization regime);
- representation or rollout/evaluation coupling logic.
Pure hyperparameter sweeps (LR, batch size, seed, minor weight nudges) are not treated as standalone attempts.
Hyperparameter changes are allowed only as supporting details within a larger conceptual change.

Micro-tuning cap per strategy

For each new strategy (new research-level concept), micro-tuning is capped at 5 runs.
Micro-tuning includes LR/seed/batch/clip/time-power/weight nudges that do not change the core concept.
After 5 micro-tuning runs for that strategy, stop tuning it and either:
- promote the best result from that strategy, or
- declare the strategy exhausted in README.md and move to a new independent strategy.
Do not reset this counter by branching or renaming; count is per strategy idea.

Non-negotiable flow-matching rule

Time conditioning in training must be random every sample (middle-time flow supervision).
Do not replace training time with fixed constants.

Required report format

reports/latest_eval.json must include:

mean_rmsd_100 (float, lower is better)
num_runs (int, must be 100)
timestamp_utc
command
notes

Repro notes

Keep seed explicit in commands.
Keep sample path explicit.
Prefer additive experiments (do not silently remove prior working options).

Multi-layer diagnosis mindset

Do not optimize only a scalar metric; analyze behavior from multiple views each attempt.
Use trajectory inspection as one analysis axis, not a fixed prescription.
Combine at least two kinds of evidence when judging a strategy:
- quantitative metrics (RMSD, train/eval gap, stability);
- qualitative dynamics (trajectory patterns, mode collapse, unrealistic motion);
- structural diagnostics (e.g., internal-distance change, geometry consistency).
Treat metric improvement without believable dynamics (or vice versa) as incomplete progress.
Example signal: if motion appears translation-dominant with weak internal change, investigate rotation/torsion learning capacity and loss balance.

Safety

On main, if pre-commit blocks a train.py change due to no RMSD improvement, either improve the model and re-evaluate, or keep iterating on a feature branch and merge/cherry-pick only when ready.
On feature branches, you may commit freely without the mean-RMSD gate; the flow-matching token check still runs whenever train.py is staged (same as on main).

6.5 KiB Raw Blame History

GUIDELINES

Purpose

Workflow

Training budget and stopping

Rollback and re-integration (not “nuke everything”)

What Counts As An Independent Attempt

Micro-tuning cap per strategy

Non-negotiable flow-matching rule

Required report format

Repro notes

Multi-layer diagnosis mindset

Safety

6.5 KiB

Raw Blame History