chore: stop tracking *.pt/*.sdf; purge from history; align hooks and docs.

git-filter-repo removed blobs; origin must be re-added. Pre-commit refreshes
BEST_PRACTICE.json and trajectory manifest only (checkpoints stay local).

Made-with: Cursor
This commit is contained in:
demian3b
2026-04-17 14:01:06 +09:00
parent 4d858c07fb
commit b8c440d654
4 changed files with 11 additions and 11 deletions

View File

@@ -16,8 +16,8 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- Every attempt must update this README (append a short entry in `## Attempt Log`).
- Attempt log is mandatory for both successful and failed trials.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** as **two commits** when both change: (1) `train.py` plus eval artifacts (`reports/latest_eval.json`, checkpoints, …) **without** `README.md`; (2) a **docs-only** commit with **only** `README.md` (attempt log). Pre-commit blocks staging `train.py` and `README.md` together. Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json` and best artifacts). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** as **two commits** when both change: (1) `train.py` plus eval artifacts (`reports/latest_eval.json`, …) **without** `README.md` (PyTorch `*.pt` checkpoints are **not** tracked—local only); (2) a **docs-only** commit with **only** `README.md` (attempt log). Pre-commit blocks staging `train.py` and `README.md` together. Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json`; checkpoints remain local). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
- **`## Attempt Log` on `main`**: new log lines written on a feature branch must be **replicated on `main`** (docs-only `README.md` commit if `train.py` is not landing yet). See `GUIDELINES.md` workflow step 6.
- Flow-matching training time must stay random (middle-time supervision is mandatory).
- Independent attempts must be research-level changes (architecture/training strategy/loss design). Pure hyperparameter-only runs are not counted as standalone attempts.
@@ -34,10 +34,9 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- `GUIDELINES.md`: operating rules and workflow.
- `BEST_PRACTICE.json`: current best-known metric and config.
- `reports/latest_eval.json`: most recent measured metric.
- `artifacts/latest_eval_best_model.pt`: checkpoint from latest run that produced `latest_eval`.
- `artifacts/best_model.pt`: best checkpoint from latest improved run.
- `reports/trajectories/`: trajectory SDFs are **gitignored** (`*.sdf`); regenerate locally after training when needed.
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and best-artifact refresh only on `main`**.
- `artifacts/*.pt`: checkpoints are **gitignored**; written locally by `train.py` / hooks (`latest_eval_best_model.pt`, `best_model.pt`).
- `reports/trajectories/`: trajectory SDFs are **gitignored** (`*.sdf`); regenerate locally (`python scripts/update_best_artifacts.py` after training when needed).
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and `BEST_PRACTICE.json` refresh only on `main`** (does not stage `.pt` / `.sdf`).
## Attempt Log
@@ -107,3 +106,4 @@ This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching
- 2026-04-17: Fast-forward merged `attempt/post-main-doc-sync` into `main`; committed `BEST_PRACTICE.json` + `artifacts/best_model.pt` sync to anchor `mean_rmsd_100=2.350750` (command matches no-EMA residual-geodesic run).
- 2026-04-17: Branch `attempt/graph-readout-geodesic` (off updated `main`): identical budget vs anchor but `--graph-readout attention` reached `mean_rmsd_100=2.716856` with early `train_mse=1000` wall; worse than anchor `2.350750`; branch not for merging to `main`.
- 2026-04-17: Fixed `scripts/update_best_artifacts.py` to build `RFMModel` with the same flags as training (notably `gcn_residual`); best-practice trajectories had looked static because the forward pass did not match the checkpoint. Checkpoints now store RFM metadata; old runs infer from `BEST_PRACTICE.json` command when keys are absent.
- 2026-04-17: Removed all `*.pt` and `*.sdf` from git history (`git-filter-repo`); added `*.pt` to `.gitignore`; pre-commit no longer stages checkpoints. `git-filter-repo` drops `origin`—re-add with `git remote add origin <url>` before push. Pushes to existing remotes need **`git push --force-with-lease`** once because history was rewritten.