Files
ai-rfm/README.md
demian3b b8c440d654 chore: stop tracking *.pt/*.sdf; purge from history; align hooks and docs.
git-filter-repo removed blobs; origin must be re-added. Pre-commit refreshes
BEST_PRACTICE.json and trajectory manifest only (checkpoints stay local).

Made-with: Cursor
2026-04-17 14:01:06 +09:00

110 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ai_rfm
RFM overfitting sandbox for a single ligand sample, with hard quality gates.
## Environment first (UV, cu126 only)
1. Ensure Python 3.10 is available.
2. Install env and deps:
- `uv sync`
3. Install git hooks:
- `uv run pre-commit install`
This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching PyG wheels.
## Repository policy
- Every attempt must update this README (append a short entry in `## Attempt Log`).
- Attempt log is mandatory for both successful and failed trials.
- **Branch-first attempts**: do training experiments on a **feature branch**; **commit each attempt** as **two commits** when both change: (1) `train.py` plus eval artifacts (`reports/latest_eval.json`, …) **without** `README.md` (PyTorch `*.pt` checkpoints are **not** tracked—local only); (2) a **docs-only** commit with **only** `README.md` (attempt log). Pre-commit blocks staging `train.py` and `README.md` together. Pre-commit does **not** enforce the mean-RMSD improvement rule on feature branches.
- **Main is the gate**: merging or committing to **`main`** with `train.py` staged triggers the performance gate (strictly better `mean_rmsd_100`, staged `latest_eval`, README log, auto-update of `BEST_PRACTICE.json`; checkpoints remain local). Land work via merge or **cherry-pick** of the commits you still trust after re-evaluation.
- **`## Attempt Log` on `main`**: new log lines written on a feature branch must be **replicated on `main`** (docs-only `README.md` commit if `train.py` is not landing yet). See `GUIDELINES.md` workflow step 6.
- Flow-matching training time must stay random (middle-time supervision is mandatory).
- Independent attempts must be research-level changes (architecture/training strategy/loss design). Pure hyperparameter-only runs are not counted as standalone attempts.
- When failures accumulate, **re-evaluate branch commits** and integrate with **cherry-pick** (or selective revert / path restore)—not wholesale rollback unless explicitly justified. Do not use `mean_rmsd_100` (or equivalent) as a training-time early-stopping signal.
## Evaluation target
- Metric: mean RMSD over 100 runs (`batchsize=100` style aggregated evaluation).
- Success criterion: `mean_rmsd_100 <= 1.0`.
## Key files
- `train.py`: training/evaluation entry point.
- `GUIDELINES.md`: operating rules and workflow.
- `BEST_PRACTICE.json`: current best-known metric and config.
- `reports/latest_eval.json`: most recent measured metric.
- `artifacts/*.pt`: checkpoints are **gitignored**; written locally by `train.py` / hooks (`latest_eval_best_model.pt`, `best_model.pt`).
- `reports/trajectories/`: trajectory SDFs are **gitignored** (`*.sdf`); regenerate locally (`python scripts/update_best_artifacts.py` after training when needed).
- `scripts/precommit_performance_gate.py`: flow-matching token check on any branch when `train.py` is staged; **mean-RMSD gate and `BEST_PRACTICE.json` refresh only on `main`** (does not stage `.pt` / `.sdf`).
## Attempt Log
- 2026-04-16: Bootstrapped docs/environment policy and cu126 UV config. Added best-practice/performance gating scaffolding before the next training run.
- 2026-04-16: Updated `train.py` to use final test metric as source of truth (`mean_rmsd_100` from 100 rollout predictions) and removed train-loss based best checkpoint tracking. Current measured `mean_rmsd_100=2.593694`.
- 2026-04-16: Updated evaluation to always use the best training checkpoint, then run 100 random initializations to time=1 and store the final RMSD mean in `reports/latest_eval.json`.
- 2026-04-16: Re-ran with best-checkpoint evaluation path active; current `mean_rmsd_100=2.582932` (improved from `2.593694`), artifacts synced to `BEST_PRACTICE.json`.
- 2026-04-16: Moved `BEST_PRACTICE.json` updates out of `train.py`; pre-commit now auto-generates/stages best report from `reports/latest_eval.json` when an improved train.py commit is made.
- 2026-04-16: Re-ran after pre-commit auto-best refactor; current `mean_rmsd_100=2.570120` (improved from `2.582932`).
- 2026-04-16: Added model-type support (`gcn`/`mlp`) and time-sampling control; best current run is `gcn hidden=512 layers=8 batch=96` with `mean_rmsd_100=2.523552`.
- 2026-04-16: Added pre-commit artifact refresh: on best update it now stages `BEST_PRACTICE.json`, `artifacts/best_model.pt`, and regenerates 6 trajectory visualizations in `reports/trajectories/`.
- 2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to `mean_rmsd_100=2.519821` with `gcn hidden=512 layers=8 batch=96`.
- 2026-04-16: Added a general multi-layer diagnosis principle to `GUIDELINES.md` so experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization.
- 2026-04-16: Tried weighted objective to counter weak rotation/torsion motion (`w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8`) and improved to `mean_rmsd_100=2.505556`.
- 2026-04-16: Failed attempt B (longer, lower-lr weighted run) reached `mean_rmsd_100=2.531661`; reverted artifacts to current best.
- 2026-04-16: Failed attempt C (torsion-heavy weights, `time_power=1.2`) reached `mean_rmsd_100=2.564594`; no commit.
- 2026-04-16: Failed attempt D (deeper GCN config) reached `mean_rmsd_100=2.739573`; no commit.
- 2026-04-16: Failed attempt E (`w_center=0.75, w_omega=2.1, w_torsion=3.2, lr=9e-4`) reached `mean_rmsd_100=2.535795`; no commit.
- 2026-04-16: Failed attempt F (balanced weights `w_center=0.9, w_omega=1.8, w_torsion=2.6`) reached `mean_rmsd_100=2.522751`; no commit.
- 2026-04-16: Failed attempt G (`accum=3` for stability) reached `mean_rmsd_100=2.561071`; no commit.
- 2026-04-16: Policy update: every attempt (success/failure) must be logged; checkpoint flow changed to `artifacts/latest_eval_best_model.pt` per run, while pre-commit promotes improved runs to `artifacts/best_model.pt`.
- 2026-04-16: Improved attempt H (same weighted config, `seed=1`) reached `mean_rmsd_100=2.461592` (improved from `2.505556`).
- 2026-04-16: Failed attempt I (same weighted config, `seed=2`) reached `mean_rmsd_100=2.590216`; no commit.
- 2026-04-16: Failed attempt J (same weighted config, `seed=3`) reached `mean_rmsd_100=2.554448`; no commit.
- 2026-04-16: Failed attempt K (research-level: added terminal-consistency auxiliary loss from `x_t` to `x_1`) reached `mean_rmsd_100=2.722863`; no commit.
- 2026-04-16: Failed attempt L (research-level: decoupled architecture with centered-coordinate trunk + separate translation head, with terminal auxiliary term) reached `mean_rmsd_100=2.637292`; no commit.
- 2026-04-16: Failed attempt M (research-level: decoupled centered-coordinate architecture only, no terminal auxiliary term) reached `mean_rmsd_100=2.479326`; close to best but no commit.
- 2026-04-16: Failed attempt N (training-strategy: added configurable early stopping with large max-epoch budget, patience/min-delta/check cadence controls) ran to max epoch with ongoing improvements (`stop_reason=max_epochs`) and reached `mean_rmsd_100=2.764940`; no commit.
- 2026-04-16: Rollback (per `GUIDELINES.md`): restored `train.py`, `reports/latest_eval.json`, and `artifacts/latest_eval_best_model.pt` to last committed baseline after attempts KN; `mean_rmsd_100` anchor unchanged at `2.461592` (`BEST_PRACTICE.json`). Objective-aligned early stopping remains disallowed for training control.
- 2026-04-16: Policy update: experiments run on **feature branches** with **one commit per attempt**; mean-RMSD pre-commit gate applies only on **`main`** (merge/cherry-pick integration). Re-triage failed stacks via **cherry-pick** / selective drops, not default full-tree rollback.
- 2026-04-16: Branch `attempt/gat-wrapped-torsion` (single commit batching three evals): Failed O — `gat` + `--torsion-wrapped-loss`, `mean_rmsd_100=2.691410`. Failed P — `gcn` + `--torsion-wrapped-loss`, `2.657594`. Failed Q — `gcn` + `--gcn-residual` (best on branch `2.514058`); all above main best `2.461592` — no merge to `main`.
- 2026-04-16: Branch `attempt/default-wrapped-clean-deps`: Made wrapped torsion loss default (`--torsion-wrapped-loss` via `BooleanOptionalAction`, default on) and added displacement-domain objective option. Dependency cleanup removed unused packages from `pyproject.toml`. Validation run (`python train.py --sdf reports/trajectories/trajectory_00.sdf --epochs 200 --batch-size 32 --eval-runs 100 --model-type gcn --hidden 256 --gcn-layers 6 --loss-domain displacement --seed 1`) reached `mean_rmsd_100=2.528226` (no improvement vs best `2.461592`), so branch not ready to merge.
- 2026-04-16: Branch `attempt/default-wrapped-clean-deps` update: removed `--torsion-wrapped-loss` CLI toggle and enforced wrapped torsion loss always-on in code. Failed R — stronger baseline (`sample.sdf`, `gcn hidden=512 layers=8`, displacement loss, `epochs=800`, `seed=1`) reached `mean_rmsd_100=2.512292`.
- 2026-04-16: Failed S — weighted config (`w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8`, `epochs=1200`, `seed=1`) reached `mean_rmsd_100=2.507794` (better than R, still above best `2.461592`).
- 2026-04-16: Failed T — same weighted config with time-bias (`time_power=1.3`) reached `mean_rmsd_100=2.517704`; no branch promotion.
- 2026-04-16: Attempt U (recommended #1, residual GCN): `--gcn-residual` with weighted displacement setup (`epochs=1200`, `seed=1`) reached `mean_rmsd_100=2.463247` (close, but above best `2.461592`).
- 2026-04-16: Attempt V (recommended #2, SO(3) geodesic rotation loss): initial full-budget run was too slow, then reduced-budget run (`--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1`) improved to `mean_rmsd_100=2.429729` (new branch best at the time).
- 2026-04-16: Attempt W (recommended #3, split heads + normalization): `--channel-layernorm --head-mlp-layers 2` with weighted displacement setup (`epochs=1200`, `seed=1`) reached `mean_rmsd_100=2.634111` (degraded).
- 2026-04-16: Attempt X (geodesic refinement, longer budget): `--rotation-loss geodesic --epochs=400 --batch-size=24 --seed=2` showed NaN instability and reached `mean_rmsd_100=2.552385`.
- 2026-04-16: Attempt Y (geodesic seed sweep): `--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=3` diverged to NaN early and reached `mean_rmsd_100=2.591940`.
- 2026-04-16: Attempt Z (geodesic stable rerun): same setup as V (`--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1`) improved further to `mean_rmsd_100=2.426296` (current best in this branch, better than anchor `2.461592`).
- 2026-04-16: Added train-loss-only early stopping controls (`--early-stop-patience`, `--early-stop-min-delta`, `--early-stop-check-every`, `--early-stop-warmup`) with `stop_reason`/`stop_epoch` reporting in logs and `reports/latest_eval.json`; objective-metric stopping remains disabled.
- 2026-04-16: Attempt AA (merge prep rerun on CUDA): repeated geodesic best-practice config (`--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1`) and measured `mean_rmsd_100=2.429895` (`num_runs=100`), still improving over main anchor `2.461592`.
- 2026-04-16: Branch `attempt/geodesic-stability-next`: stress-tested geodesic+residual variants; best observed metric reached `mean_rmsd_100=2.388103` (`--rotation-loss geodesic --gcn-residual --epochs=280 --batch-size=24 --lr=7e-4 --seed=1`), with occasional NaN instability in nearby runs.
- 2026-04-16: Stabilization-only update: added non-finite guards/clamps in geodesic loss, Kabsch RMSD, and training loss fallback to reduce NaN-caused crashes during long geodesic sweeps.
- 2026-04-16: Policy update in `GUIDELINES.md`: when a branch obtains a strict best `mean_rmsd_100`, integration into `main` is mandatory before continuing new branch experiments.
- 2026-04-16: Hook policy update: `train-performance-gate` now runs at both commit-time and `post-merge`, and enforces main-branch merge-time validation/refresh when merged diff includes `train.py`.
- 2026-04-16: Attempt AB (trajectory-instability hypothesis): added `--omega-max-norm` clipping to stabilize geodesic+residual rotation outputs and reduce NaN-prone spikes; run with `--omega-max-norm 3.0` reached `mean_rmsd_100=2.436618` (more stable but worse than branch best `2.388103`).
- 2026-04-16: Strategy S1 (hybrid rotation loss, capped at 5 micro-tuning runs) completed: alpha sweep (`0.7/0.5/0.3`) then lr/seed tuning on best alpha; best S1 result was `mean_rmsd_100=2.439254` (no new best), strategy marked exhausted.
- 2026-04-16: Strategy S2 (rotation-weight curriculum, capped at 5 micro-tuning runs) completed: best run used `--rotation-weight-start 1.0 --rotation-weight-warmup-epochs 120` and reached `mean_rmsd_100=2.417450` (no new best), strategy marked exhausted.
- 2026-04-16: Multi-GPU parallel sweep (GPU0/1/2) around residual-geodesic schedules produced `2.394431`, `2.420601`, and `2.450024`; no update over branch best `2.388103`.
- 2026-04-16: Follow-up parallel sweep (GPU0/1/2) with direct best-axis reruns produced `2.430481`, `2.412036`, and `2.720380`; observed heavy seed sensitivity and intermittent fallback-to-1000 behavior on unstable seeds.
- 2026-04-16: Continued parallel sweep with rotation curriculum variants (`start=0.85/0.95` and lower-lr schedule) produced `2.450391`, `2.457748`, and `2.426384`; no improvement over branch best `2.388103`.
- 2026-04-16: Deep schedule parallel sweep (`epochs=320~380`, `start=1.0` with warmup variants, multi-seed) produced `2.464117`, `2.410706`, and `2.419527`; still below branch best and showed late-epoch fallback instability in 일부 runs.
- 2026-04-16: Post-reset attempt on `attempt/s3-tail-risk-next` (trajectory-tail-risk focus) using residual-geodesic with clipped omega and scheduled rotation weight (`lr=6.8e-4`, `grad_clip=0.7`, `start=1.0`, warmup `120`) reached `mean_rmsd_100=2.464730`; no improvement.
- 2026-04-16: Restarted branch `attempt/s3-restart-after-doc-sync` and ran immediate S3 continuation (`lr=6.0e-4`, `grad_clip=0.7`, geodesic+residual, `omega_max_norm=5.0`, warmup `120`), obtaining `mean_rmsd_100=2.474573`; no improvement over best `2.388103`.
- 2026-04-16: Strategy S4 start (tail-risk suppressor): added upper-quantile tail penalty in training loss and ran first trial (`tail-risk-weight=0.2`, `tail-risk-quantile=0.85`, `lr=6.8e-4`, geodesic+residual), yielding `mean_rmsd_100=2.466082`; no improvement over best `2.388103`.
- 2026-04-16: Strategy S4 micro-tuning #2 lowered tail penalty (`tail-risk-weight=0.1`, quantile `0.85`, `lr=6.4e-4`) to reduce over-regularization, but result was `mean_rmsd_100=2.476267`; no improvement.
- 2026-04-16: Strategy S4 micro-tuning #3 softened tail coverage (`tail-risk-quantile=0.9`, `tail-risk-weight=0.2`, `lr=6.8e-4`) and improved to `mean_rmsd_100=2.440570`, but still below best `2.388103`.
- 2026-04-16: Strategy S4 micro-tuning #4 increased tail penalty (`tail-risk-weight=0.25`, quantile `0.9`) and regressed sharply to `mean_rmsd_100=2.601258`; indicates over-penalization risk.
- 2026-04-16: Strategy S4 micro-tuning #5 changed seed (`seed=2`, `tail-risk-weight=0.2`, quantile `0.9`) and encountered prolonged fallback-to-1000 behavior with `mean_rmsd_100=2.709563`; S4 hit 5-run cap with no best update.
- 2026-04-16: Structural torsion head (`--torsion-head bond_pair`, GCN only): translation/rotation still use full-graph mean-pooled trunk+time; each torsion `k` runs the **same GCN weights** on the **movable-side induced subgraph** (mask only selects nodes/edges for that subgraph—mask values are not fed as features), mean-pools that subgraph, concatenates with global pooled context, `LayerNorm`, then a small MLP to one scalar. Replaced the prior mask-as-feature design. One calibration run (`epochs=320`, geodesic+residual) reached `mean_rmsd_100=2.598530` with long `train_mse=1000` plateaus; worse than best `2.388103`, likely dominated by multi-forward cost + same geodesic instability rather than readout alone.
- 2026-04-16: bond_pair stabilization pass: subgraph batch `add_self_loops`, post-pool `LayerNorm` on subgraph embedding, small Xavier init on torsion MLP, `torch.nan_to_num` + optional output clamp (`--subgraph-torsion-clip`), and Adam param-group with `--subgraph-lr-scale` (default `0.3`) for `sub_convs`/torsion head vs main `--lr`. Smoke (48ep) avoided `1000` train spikes; full run (`epochs=320`, `lr=5.5e-4`, `subgraph_lr_scale=0.25`, `clip=6`, `eval-runs=100`) reached `mean_rmsd_100=2.606118` (still above best `2.388103`) but training telemetry stayed below the non-finite fallback wall in logged epochs.
- 2026-04-16: bond_pair subgraph **full-graph trunk fuse**: subgraph GCN first-layer input is `concat(local_coords, full-graph node h)` (same-step trunk embedding, state-dependent—not static chemistry). Full run (`epochs=320`, same stab hyperparams as prior) yielded `mean_rmsd_100=2.620511` vs `2.606118` without fuse—slightly worse, so next axis is likely rotation/geodesic coupling or subgraph depth, not more static atom features.
- 2026-04-16: **Post-fuse strategy sweep (automation)**: implemented `--torsion-sub-gcn-layers` (subgraph depth ≤ trunk) and dedicated `sub_gcn_skip` for the subgraph stack; added `scripts/run_bond_pair_strategy_sweep.sh` to run six queued experiments in order—(E1) `rotation-loss=hybrid` α=0.5, (E2) `rotation-loss=mse`, (E3) `--torsion-sub-gcn-layers 4`, (E4) `--subgraph-lr-scale 0.4`, (E5) `--subgraph-lr-scale 0.15`, (E6) main `--lr 4.5e-4`—all with `bond_pair`, `gcn-residual`, `omega_max_norm=5`, stab clip `6`, default `subgraph_lr_scale=0.25` except E4/E5/E6 as noted. Run locally: `bash scripts/run_bond_pair_strategy_sweep.sh /path/to/sample.sdf``reports/bond_pair_strategy_sweep.tsv`; append each `mean_rmsd_100` line to this log after the run (this agent session could not complete long GPU sweeps end-to-end).
- 2026-04-17: GPU via project `.venv`: A/B on residual-geodesic (`gcn` hidden=512 layers=8, epochs=280, batch=24, lr=7e-4, weighted displacement, seed=1, eval-runs=100). Run A with `--ema-decay 0.999` reached `mean_rmsd_100=2.554015` (late epochs showed `train_mse=1000` fallback). Run B without EMA reached `mean_rmsd_100=2.350750`; `reports/latest_eval.json` and `artifacts/latest_eval_best_model.pt` left from Run B. On this pair, EMA best-train checkpoint hurt final RMSD vs online weights; no claim vs historical branch best `2.388103` (different wall-clock / step budget semantics).
- 2026-04-17: Fast-forward merged `attempt/post-main-doc-sync` into `main`; committed `BEST_PRACTICE.json` + `artifacts/best_model.pt` sync to anchor `mean_rmsd_100=2.350750` (command matches no-EMA residual-geodesic run).
- 2026-04-17: Branch `attempt/graph-readout-geodesic` (off updated `main`): identical budget vs anchor but `--graph-readout attention` reached `mean_rmsd_100=2.716856` with early `train_mse=1000` wall; worse than anchor `2.350750`; branch not for merging to `main`.
- 2026-04-17: Fixed `scripts/update_best_artifacts.py` to build `RFMModel` with the same flags as training (notably `gcn_residual`); best-practice trajectories had looked static because the forward pass did not match the checkpoint. Checkpoints now store RFM metadata; old runs infer from `BEST_PRACTICE.json` command when keys are absent.
- 2026-04-17: Removed all `*.pt` and `*.sdf` from git history (`git-filter-repo`); added `*.pt` to `.gitignore`; pre-commit no longer stages checkpoints. `git-filter-repo` drops `origin`—re-add with `git remote add origin <url>` before push. Pushes to existing remotes need **`git push --force-with-lease`** once because history was rewritten.