Go to file

demian3b d436444f7c Log S4 micro-tuning attempt 3.

Record quantile-softened tail-risk run and commit resulting evaluation artifacts before the next attempt.

Made-with: Cursor

2026-04-16 23:38:08 +09:00

reports

Log S4 micro-tuning attempt 3.

2026-04-16 23:38:08 +09:00

scripts

Record per-run logging policy and sync non-train artifacts.

2026-04-16 23:28:56 +09:00

.gitignore

Track best checkpoint and auto-refresh trajectory artifacts.

2026-04-16 17:23:06 +09:00

.pre-commit-config.yaml

Update policies and attempt logs on main.

2026-04-16 23:21:27 +09:00

.python-version

Initialize ai_rfm documentation and cu126 uv scaffolding.

2026-04-16 16:52:57 +09:00

BEST_PRACTICE.json

Enforce full attempt logging and latest-eval checkpoint flow.

2026-04-16 17:44:09 +09:00

GUIDELINES.md

Record per-run logging policy and sync non-train artifacts.

2026-04-16 23:28:56 +09:00

pyproject.toml

Improve geodesic training stability and update best eval baseline.

2026-04-16 22:20:36 +09:00

README.md

Log S4 micro-tuning attempt 3.

2026-04-16 23:38:08 +09:00

train.py

Start S4 tail-risk strategy and log first trial.

2026-04-16 23:35:29 +09:00

README.md

ai_rfm

RFM overfitting sandbox for a single ligand sample, with hard quality gates.

Environment first (UV, cu126 only)

Ensure Python 3.10 is available.
Install env and deps:
- uv sync
Install git hooks:
- uv run pre-commit install

This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching PyG wheels.

Repository policy

Every attempt must update this README (append a short entry in ## Attempt Log).
Attempt log is mandatory for both successful and failed trials.
Branch-first attempts: do training experiments on a feature branch; commit each attempt on that branch (typically include train.py, reports/latest_eval.json, and README log for that run). Pre-commit does not enforce the mean-RMSD improvement rule on feature branches.
Main is the gate: merging or committing to main with train.py staged triggers the performance gate (strictly better mean_rmsd_100, staged latest_eval, README log, auto-update of BEST_PRACTICE.json and best artifacts). Land work via merge or cherry-pick of the commits you still trust after re-evaluation.
## Attempt Log on main: new log lines written on a feature branch must be replicated on main (docs-only README.md commit if train.py is not landing yet). See GUIDELINES.md workflow step 6.
Flow-matching training time must stay random (middle-time supervision is mandatory).
Independent attempts must be research-level changes (architecture/training strategy/loss design). Pure hyperparameter-only runs are not counted as standalone attempts.
When failures accumulate, re-evaluate branch commits and integrate with cherry-pick (or selective revert / path restore)—not wholesale rollback unless explicitly justified. Do not use mean_rmsd_100 (or equivalent) as a training-time early-stopping signal.

Evaluation target

Metric: mean RMSD over 100 runs (batchsize=100 style aggregated evaluation).
Success criterion: mean_rmsd_100 <= 1.0.

Key files

train.py: training/evaluation entry point.
GUIDELINES.md: operating rules and workflow.
BEST_PRACTICE.json: current best-known metric and config.
reports/latest_eval.json: most recent measured metric.
artifacts/latest_eval_best_model.pt: checkpoint from latest run that produced latest_eval.
artifacts/best_model.pt: best checkpoint from latest improved run.
reports/trajectories/: 6 regenerated trajectories from current best model.
scripts/precommit_performance_gate.py: flow-matching token check on any branch when train.py is staged; mean-RMSD gate and best-artifact refresh only on main.

Attempt Log

2026-04-16: Bootstrapped docs/environment policy and cu126 UV config. Added best-practice/performance gating scaffolding before the next training run.
2026-04-16: Updated train.py to use final test metric as source of truth (mean_rmsd_100 from 100 rollout predictions) and removed train-loss based best checkpoint tracking. Current measured mean_rmsd_100=2.593694.
2026-04-16: Updated evaluation to always use the best training checkpoint, then run 100 random initializations to time=1 and store the final RMSD mean in reports/latest_eval.json.
2026-04-16: Re-ran with best-checkpoint evaluation path active; current mean_rmsd_100=2.582932 (improved from 2.593694), artifacts synced to BEST_PRACTICE.json.
2026-04-16: Moved BEST_PRACTICE.json updates out of train.py; pre-commit now auto-generates/stages best report from reports/latest_eval.json when an improved train.py commit is made.
2026-04-16: Re-ran after pre-commit auto-best refactor; current mean_rmsd_100=2.570120 (improved from 2.582932).
2026-04-16: Added model-type support (gcn/mlp) and time-sampling control; best current run is gcn hidden=512 layers=8 batch=96 with mean_rmsd_100=2.523552.
2026-04-16: Added pre-commit artifact refresh: on best update it now stages BEST_PRACTICE.json, artifacts/best_model.pt, and regenerates 6 trajectory visualizations in reports/trajectories/.
2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to mean_rmsd_100=2.519821 with gcn hidden=512 layers=8 batch=96.
2026-04-16: Added a general multi-layer diagnosis principle to GUIDELINES.md so experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization.
2026-04-16: Tried weighted objective to counter weak rotation/torsion motion (w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8) and improved to mean_rmsd_100=2.505556.
2026-04-16: Failed attempt B (longer, lower-lr weighted run) reached mean_rmsd_100=2.531661; reverted artifacts to current best.
2026-04-16: Failed attempt C (torsion-heavy weights, time_power=1.2) reached mean_rmsd_100=2.564594; no commit.
2026-04-16: Failed attempt D (deeper GCN config) reached mean_rmsd_100=2.739573; no commit.
2026-04-16: Failed attempt E (w_center=0.75, w_omega=2.1, w_torsion=3.2, lr=9e-4) reached mean_rmsd_100=2.535795; no commit.
2026-04-16: Failed attempt F (balanced weights w_center=0.9, w_omega=1.8, w_torsion=2.6) reached mean_rmsd_100=2.522751; no commit.
2026-04-16: Failed attempt G (accum=3 for stability) reached mean_rmsd_100=2.561071; no commit.
2026-04-16: Policy update: every attempt (success/failure) must be logged; checkpoint flow changed to artifacts/latest_eval_best_model.pt per run, while pre-commit promotes improved runs to artifacts/best_model.pt.
2026-04-16: Improved attempt H (same weighted config, seed=1) reached mean_rmsd_100=2.461592 (improved from 2.505556).
2026-04-16: Failed attempt I (same weighted config, seed=2) reached mean_rmsd_100=2.590216; no commit.
2026-04-16: Failed attempt J (same weighted config, seed=3) reached mean_rmsd_100=2.554448; no commit.
2026-04-16: Failed attempt K (research-level: added terminal-consistency auxiliary loss from x_t to x_1) reached mean_rmsd_100=2.722863; no commit.
2026-04-16: Failed attempt L (research-level: decoupled architecture with centered-coordinate trunk + separate translation head, with terminal auxiliary term) reached mean_rmsd_100=2.637292; no commit.
2026-04-16: Failed attempt M (research-level: decoupled centered-coordinate architecture only, no terminal auxiliary term) reached mean_rmsd_100=2.479326; close to best but no commit.
2026-04-16: Failed attempt N (training-strategy: added configurable early stopping with large max-epoch budget, patience/min-delta/check cadence controls) ran to max epoch with ongoing improvements (stop_reason=max_epochs) and reached mean_rmsd_100=2.764940; no commit.
2026-04-16: Rollback (per GUIDELINES.md): restored train.py, reports/latest_eval.json, and artifacts/latest_eval_best_model.pt to last committed baseline after attempts K–N; mean_rmsd_100 anchor unchanged at 2.461592 (BEST_PRACTICE.json). Objective-aligned early stopping remains disallowed for training control.
2026-04-16: Policy update: experiments run on feature branches with one commit per attempt; mean-RMSD pre-commit gate applies only on main (merge/cherry-pick integration). Re-triage failed stacks via cherry-pick / selective drops, not default full-tree rollback.
2026-04-16: Branch attempt/gat-wrapped-torsion (single commit batching three evals): Failed O — gat + --torsion-wrapped-loss, mean_rmsd_100=2.691410. Failed P — gcn + --torsion-wrapped-loss, 2.657594. Failed Q — gcn + --gcn-residual (best on branch 2.514058); all above main best 2.461592 — no merge to main.
2026-04-16: Branch attempt/default-wrapped-clean-deps: Made wrapped torsion loss default (--torsion-wrapped-loss via BooleanOptionalAction, default on) and added displacement-domain objective option. Dependency cleanup removed unused packages from pyproject.toml. Validation run (python train.py --sdf reports/trajectories/trajectory_00.sdf --epochs 200 --batch-size 32 --eval-runs 100 --model-type gcn --hidden 256 --gcn-layers 6 --loss-domain displacement --seed 1) reached mean_rmsd_100=2.528226 (no improvement vs best 2.461592), so branch not ready to merge.
2026-04-16: Branch attempt/default-wrapped-clean-deps update: removed --torsion-wrapped-loss CLI toggle and enforced wrapped torsion loss always-on in code. Failed R — stronger baseline (sample.sdf, gcn hidden=512 layers=8, displacement loss, epochs=800, seed=1) reached mean_rmsd_100=2.512292.
2026-04-16: Failed S — weighted config (w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8, epochs=1200, seed=1) reached mean_rmsd_100=2.507794 (better than R, still above best 2.461592).
2026-04-16: Failed T — same weighted config with time-bias (time_power=1.3) reached mean_rmsd_100=2.517704; no branch promotion.
2026-04-16: Attempt U (recommended #1, residual GCN): --gcn-residual with weighted displacement setup (epochs=1200, seed=1) reached mean_rmsd_100=2.463247 (close, but above best 2.461592).
2026-04-16: Attempt V (recommended #2, SO(3) geodesic rotation loss): initial full-budget run was too slow, then reduced-budget run (--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) improved to mean_rmsd_100=2.429729 (new branch best at the time).
2026-04-16: Attempt W (recommended #3, split heads + normalization): --channel-layernorm --head-mlp-layers 2 with weighted displacement setup (epochs=1200, seed=1) reached mean_rmsd_100=2.634111 (degraded).
2026-04-16: Attempt X (geodesic refinement, longer budget): --rotation-loss geodesic --epochs=400 --batch-size=24 --seed=2 showed NaN instability and reached mean_rmsd_100=2.552385.
2026-04-16: Attempt Y (geodesic seed sweep): --rotation-loss geodesic --epochs=200 --batch-size=24 --seed=3 diverged to NaN early and reached mean_rmsd_100=2.591940.
2026-04-16: Attempt Z (geodesic stable rerun): same setup as V (--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) improved further to mean_rmsd_100=2.426296 (current best in this branch, better than anchor 2.461592).
2026-04-16: Added train-loss-only early stopping controls (--early-stop-patience, --early-stop-min-delta, --early-stop-check-every, --early-stop-warmup) with stop_reason/stop_epoch reporting in logs and reports/latest_eval.json; objective-metric stopping remains disabled.
2026-04-16: Attempt AA (merge prep rerun on CUDA): repeated geodesic best-practice config (--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) and measured mean_rmsd_100=2.429895 (num_runs=100), still improving over main anchor 2.461592.
2026-04-16: Branch attempt/geodesic-stability-next: stress-tested geodesic+residual variants; best observed metric reached mean_rmsd_100=2.388103 (--rotation-loss geodesic --gcn-residual --epochs=280 --batch-size=24 --lr=7e-4 --seed=1), with occasional NaN instability in nearby runs.
2026-04-16: Stabilization-only update: added non-finite guards/clamps in geodesic loss, Kabsch RMSD, and training loss fallback to reduce NaN-caused crashes during long geodesic sweeps.
2026-04-16: Policy update in GUIDELINES.md: when a branch obtains a strict best mean_rmsd_100, integration into main is mandatory before continuing new branch experiments.
2026-04-16: Hook policy update: train-performance-gate now runs at both commit-time and post-merge, and enforces main-branch merge-time validation/refresh when merged diff includes train.py.
2026-04-16: Attempt AB (trajectory-instability hypothesis): added --omega-max-norm clipping to stabilize geodesic+residual rotation outputs and reduce NaN-prone spikes; run with --omega-max-norm 3.0 reached mean_rmsd_100=2.436618 (more stable but worse than branch best 2.388103).
2026-04-16: Strategy S1 (hybrid rotation loss, capped at 5 micro-tuning runs) completed: alpha sweep (0.7/0.5/0.3) then lr/seed tuning on best alpha; best S1 result was mean_rmsd_100=2.439254 (no new best), strategy marked exhausted.
2026-04-16: Strategy S2 (rotation-weight curriculum, capped at 5 micro-tuning runs) completed: best run used --rotation-weight-start 1.0 --rotation-weight-warmup-epochs 120 and reached mean_rmsd_100=2.417450 (no new best), strategy marked exhausted.
2026-04-16: Multi-GPU parallel sweep (GPU0/1/2) around residual-geodesic schedules produced 2.394431, 2.420601, and 2.450024; no update over branch best 2.388103.
2026-04-16: Follow-up parallel sweep (GPU0/1/2) with direct best-axis reruns produced 2.430481, 2.412036, and 2.720380; observed heavy seed sensitivity and intermittent fallback-to-1000 behavior on unstable seeds.
2026-04-16: Continued parallel sweep with rotation curriculum variants (start=0.85/0.95 and lower-lr schedule) produced 2.450391, 2.457748, and 2.426384; no improvement over branch best 2.388103.
2026-04-16: Deep schedule parallel sweep (epochs=320~380, start=1.0 with warmup variants, multi-seed) produced 2.464117, 2.410706, and 2.419527; still below branch best and showed late-epoch fallback instability in 일부 runs.
2026-04-16: Post-reset attempt on attempt/s3-tail-risk-next (trajectory-tail-risk focus) using residual-geodesic with clipped omega and scheduled rotation weight (lr=6.8e-4, grad_clip=0.7, start=1.0, warmup 120) reached mean_rmsd_100=2.464730; no improvement.
2026-04-16: Restarted branch attempt/s3-restart-after-doc-sync and ran immediate S3 continuation (lr=6.0e-4, grad_clip=0.7, geodesic+residual, omega_max_norm=5.0, warmup 120), obtaining mean_rmsd_100=2.474573; no improvement over best 2.388103.
2026-04-16: Strategy S4 start (tail-risk suppressor): added upper-quantile tail penalty in training loss and ran first trial (tail-risk-weight=0.2, tail-risk-quantile=0.85, lr=6.8e-4, geodesic+residual), yielding mean_rmsd_100=2.466082; no improvement over best 2.388103.
2026-04-16: Strategy S4 micro-tuning #2 lowered tail penalty (tail-risk-weight=0.1, quantile 0.85, lr=6.4e-4) to reduce over-regularization, but result was mean_rmsd_100=2.476267; no improvement.
2026-04-16: Strategy S4 micro-tuning #3 softened tail coverage (tail-risk-quantile=0.9, tail-risk-weight=0.2, lr=6.8e-4) and improved to mean_rmsd_100=2.440570, but still below best 2.388103.

README.md Unescape Escape

ai_rfm

Environment first (UV, cu126 only)

Repository policy

Evaluation target

Key files

Attempt Log

README.md