Files
ai-rfm/GUIDELINES.md
demian3b b8c440d654 chore: stop tracking *.pt/*.sdf; purge from history; align hooks and docs.
git-filter-repo removed blobs; origin must be re-added. Pre-commit refreshes
BEST_PRACTICE.json and trajectory manifest only (checkpoints stay local).

Made-with: Cursor
2026-04-17 14:01:06 +09:00

6.6 KiB

GUIDELINES

Purpose

Make overfitting robust and measurable, targeting mean_rmsd_100 <= 1.0.

Workflow

  1. Branch per line of work: create a branch (e.g. attempt/<topic>) before changing train.py for a new experiment.
  2. Modify code/config, run training, write reports/latest_eval.json.
  3. Append one line to README.md attempt log for every attempt (success and failure).
  4. Never commit train.py and README.md in the same git commit. After a training run: (a) commit code/eval artifacts (train.py, reports/latest_eval.json, etc.) without README.md (PyTorch *.pt checkpoints are local-only, gitignored); then (b) make a docs-only commit that touches only README.md (attempt log line). Pre-commit enforces this split so README.md can be cherry-picked to main without dragging train.py along. Feature-branch commits are not blocked by the mean-RMSD performance gate.
  5. When a branch is ready to land: merge (or cherry-pick) into main. The performance gate and BEST_PRACTICE.json refresh run only on main when train.py is part of the commit (checkpoints are not committed).
  6. README.md attempt log must also live on main: if you only merged code later or abandoned a train.py merge, still bring new ## Attempt Log lines onto main soon after (docs-only commit is fine—stage only README.md so the mean-RMSD gate does not run). Cherry-pick the README hunk from the branch or copy the lines; do not leave the canonical log only on a feature branch.
  7. Mandatory best-update integration: if any feature-branch attempt records a strictly better mean_rmsd_100 than the current main anchor, treat it as merge-ready work. Merge/cherry-pick it into main promptly (do not keep a known best only on a feature branch), then continue new experiments from a fresh branch off updated main.
  8. Per-attempt logging+commit is mandatory: every experiment run must immediately append its result to README.md, then record it in git before the next run—but as separate commits from rule 4: code commit first, README.md-only commit second (same attempt, two commits minimum when both files change).
  9. SDF outputs stay out of git: *.sdf is ignored; regenerate trajectories locally instead of committing structure files.

Training budget and stopping

  • Do not shrink the epoch budget by default while the learning curve is still improving.
  • If wall-clock is tight, use explicit early stopping on training-side signals only (e.g. plateauing training loss), with a large max-epoch cap and patience.
  • Do not use the final gated metric (mean_rmsd_100) or any equivalent “mini-test” of the same objective during training to decide when to stop. That peeks at the evaluation target and is leakage / cheating in this single-sample overfit setting.
  • Do not introduce a held-out RMSD split for stopping; the reported metric is the quality gate, not a training control signal.
  • Record in the attempt note how training ended (e.g. max_epochs, early_stop on train loss only).

Rollback and re-integration (not “nuke everything”)

  • Anchor: BEST_PRACTICE.json plus the last main commit that passed the merge-time gate define the production story. Feature branches are scratch space.
  • Prefer selective undo: when attempts pile up, re-read the branch history commit-by-commit, decide what actually helped, and drop only what is useless (revert single commits, git restore specific paths, or reset a branch tip while keeping good commits reachable).
  • Cherry-pick integration: to land work on main without merging a messy branch wholesale, create a fresh branch from main and cherry-pick only the commits you still believe in; resolve conflicts; run eval; merge to main when the gate passes.
  • Log honestly: append a short note when you abandon a direction (what was dropped and why), without erasing earlier attempt log lines.

What Counts As An Independent Attempt

  • Independent attempts must change a research-level concept, such as:
    • model architecture/backbone/head design;
    • objective/loss formulation;
    • training strategy (curriculum, teacher forcing style, optimization regime);
    • representation or rollout/evaluation coupling logic.
  • Pure hyperparameter sweeps (LR, batch size, seed, minor weight nudges) are not treated as standalone attempts.
  • Hyperparameter changes are allowed only as supporting details within a larger conceptual change.

Micro-tuning cap per strategy

  • For each new strategy (new research-level concept), micro-tuning is capped at 5 runs.
  • Micro-tuning includes LR/seed/batch/clip/time-power/weight nudges that do not change the core concept.
  • After 5 micro-tuning runs for that strategy, stop tuning it and either:
    • promote the best result from that strategy, or
    • declare the strategy exhausted in README.md and move to a new independent strategy.
  • Do not reset this counter by branching or renaming; count is per strategy idea.

Non-negotiable flow-matching rule

  • Time conditioning in training must be random every sample (middle-time flow supervision).
  • Do not replace training time with fixed constants.

Required report format

reports/latest_eval.json must include:

  • mean_rmsd_100 (float, lower is better)
  • num_runs (int, must be 100)
  • timestamp_utc
  • command
  • notes

Repro notes

  • Keep seed explicit in commands.
  • Keep sample path explicit.
  • Prefer additive experiments (do not silently remove prior working options).

Multi-layer diagnosis mindset

  • Do not optimize only a scalar metric; analyze behavior from multiple views each attempt.
  • Use trajectory inspection as one analysis axis, not a fixed prescription.
  • Combine at least two kinds of evidence when judging a strategy:
    • quantitative metrics (RMSD, train/eval gap, stability);
    • qualitative dynamics (trajectory patterns, mode collapse, unrealistic motion);
    • structural diagnostics (e.g., internal-distance change, geometry consistency).
  • Treat metric improvement without believable dynamics (or vice versa) as incomplete progress.
  • Example signal: if motion appears translation-dominant with weak internal change, investigate rotation/torsion learning capacity and loss balance.

Safety

  • On main, if pre-commit blocks a train.py change due to no RMSD improvement, either improve the model and re-evaluate, or keep iterating on a feature branch and merge/cherry-pick only when ready.
  • On feature branches, you may commit freely without the mean-RMSD gate; the flow-matching token check still runs whenever train.py is staged (same as on main).