d436444f7c1023d48627451eada54a1dc9207f01
Record quantile-softened tail-risk run and commit resulting evaluation artifacts before the next attempt. Made-with: Cursor
ai_rfm
RFM overfitting sandbox for a single ligand sample, with hard quality gates.
Environment first (UV, cu126 only)
- Ensure Python 3.10 is available.
- Install env and deps:
uv sync
- Install git hooks:
uv run pre-commit install
This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching PyG wheels.
Repository policy
- Every attempt must update this README (append a short entry in
## Attempt Log). - Attempt log is mandatory for both successful and failed trials.
- Branch-first attempts: do training experiments on a feature branch; commit each attempt on that branch (typically include
train.py,reports/latest_eval.json, and README log for that run). Pre-commit does not enforce the mean-RMSD improvement rule on feature branches. - Main is the gate: merging or committing to
mainwithtrain.pystaged triggers the performance gate (strictly bettermean_rmsd_100, stagedlatest_eval, README log, auto-update ofBEST_PRACTICE.jsonand best artifacts). Land work via merge or cherry-pick of the commits you still trust after re-evaluation. ## Attempt Logonmain: new log lines written on a feature branch must be replicated onmain(docs-onlyREADME.mdcommit iftrain.pyis not landing yet). SeeGUIDELINES.mdworkflow step 6.- Flow-matching training time must stay random (middle-time supervision is mandatory).
- Independent attempts must be research-level changes (architecture/training strategy/loss design). Pure hyperparameter-only runs are not counted as standalone attempts.
- When failures accumulate, re-evaluate branch commits and integrate with cherry-pick (or selective revert / path restore)—not wholesale rollback unless explicitly justified. Do not use
mean_rmsd_100(or equivalent) as a training-time early-stopping signal.
Evaluation target
- Metric: mean RMSD over 100 runs (
batchsize=100style aggregated evaluation). - Success criterion:
mean_rmsd_100 <= 1.0.
Key files
train.py: training/evaluation entry point.GUIDELINES.md: operating rules and workflow.BEST_PRACTICE.json: current best-known metric and config.reports/latest_eval.json: most recent measured metric.artifacts/latest_eval_best_model.pt: checkpoint from latest run that producedlatest_eval.artifacts/best_model.pt: best checkpoint from latest improved run.reports/trajectories/: 6 regenerated trajectories from current best model.scripts/precommit_performance_gate.py: flow-matching token check on any branch whentrain.pyis staged; mean-RMSD gate and best-artifact refresh only onmain.
Attempt Log
- 2026-04-16: Bootstrapped docs/environment policy and cu126 UV config. Added best-practice/performance gating scaffolding before the next training run.
- 2026-04-16: Updated
train.pyto use final test metric as source of truth (mean_rmsd_100from 100 rollout predictions) and removed train-loss based best checkpoint tracking. Current measuredmean_rmsd_100=2.593694. - 2026-04-16: Updated evaluation to always use the best training checkpoint, then run 100 random initializations to time=1 and store the final RMSD mean in
reports/latest_eval.json. - 2026-04-16: Re-ran with best-checkpoint evaluation path active; current
mean_rmsd_100=2.582932(improved from2.593694), artifacts synced toBEST_PRACTICE.json. - 2026-04-16: Moved
BEST_PRACTICE.jsonupdates out oftrain.py; pre-commit now auto-generates/stages best report fromreports/latest_eval.jsonwhen an improved train.py commit is made. - 2026-04-16: Re-ran after pre-commit auto-best refactor; current
mean_rmsd_100=2.570120(improved from2.582932). - 2026-04-16: Added model-type support (
gcn/mlp) and time-sampling control; best current run isgcn hidden=512 layers=8 batch=96withmean_rmsd_100=2.523552. - 2026-04-16: Added pre-commit artifact refresh: on best update it now stages
BEST_PRACTICE.json,artifacts/best_model.pt, and regenerates 6 trajectory visualizations inreports/trajectories/. - 2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to
mean_rmsd_100=2.519821withgcn hidden=512 layers=8 batch=96. - 2026-04-16: Added a general multi-layer diagnosis principle to
GUIDELINES.mdso experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization. - 2026-04-16: Tried weighted objective to counter weak rotation/torsion motion (
w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8) and improved tomean_rmsd_100=2.505556. - 2026-04-16: Failed attempt B (longer, lower-lr weighted run) reached
mean_rmsd_100=2.531661; reverted artifacts to current best. - 2026-04-16: Failed attempt C (torsion-heavy weights,
time_power=1.2) reachedmean_rmsd_100=2.564594; no commit. - 2026-04-16: Failed attempt D (deeper GCN config) reached
mean_rmsd_100=2.739573; no commit. - 2026-04-16: Failed attempt E (
w_center=0.75, w_omega=2.1, w_torsion=3.2, lr=9e-4) reachedmean_rmsd_100=2.535795; no commit. - 2026-04-16: Failed attempt F (balanced weights
w_center=0.9, w_omega=1.8, w_torsion=2.6) reachedmean_rmsd_100=2.522751; no commit. - 2026-04-16: Failed attempt G (
accum=3for stability) reachedmean_rmsd_100=2.561071; no commit. - 2026-04-16: Policy update: every attempt (success/failure) must be logged; checkpoint flow changed to
artifacts/latest_eval_best_model.ptper run, while pre-commit promotes improved runs toartifacts/best_model.pt. - 2026-04-16: Improved attempt H (same weighted config,
seed=1) reachedmean_rmsd_100=2.461592(improved from2.505556). - 2026-04-16: Failed attempt I (same weighted config,
seed=2) reachedmean_rmsd_100=2.590216; no commit. - 2026-04-16: Failed attempt J (same weighted config,
seed=3) reachedmean_rmsd_100=2.554448; no commit. - 2026-04-16: Failed attempt K (research-level: added terminal-consistency auxiliary loss from
x_ttox_1) reachedmean_rmsd_100=2.722863; no commit. - 2026-04-16: Failed attempt L (research-level: decoupled architecture with centered-coordinate trunk + separate translation head, with terminal auxiliary term) reached
mean_rmsd_100=2.637292; no commit. - 2026-04-16: Failed attempt M (research-level: decoupled centered-coordinate architecture only, no terminal auxiliary term) reached
mean_rmsd_100=2.479326; close to best but no commit. - 2026-04-16: Failed attempt N (training-strategy: added configurable early stopping with large max-epoch budget, patience/min-delta/check cadence controls) ran to max epoch with ongoing improvements (
stop_reason=max_epochs) and reachedmean_rmsd_100=2.764940; no commit. - 2026-04-16: Rollback (per
GUIDELINES.md): restoredtrain.py,reports/latest_eval.json, andartifacts/latest_eval_best_model.ptto last committed baseline after attempts K–N;mean_rmsd_100anchor unchanged at2.461592(BEST_PRACTICE.json). Objective-aligned early stopping remains disallowed for training control. - 2026-04-16: Policy update: experiments run on feature branches with one commit per attempt; mean-RMSD pre-commit gate applies only on
main(merge/cherry-pick integration). Re-triage failed stacks via cherry-pick / selective drops, not default full-tree rollback. - 2026-04-16: Branch
attempt/gat-wrapped-torsion(single commit batching three evals): Failed O —gat+--torsion-wrapped-loss,mean_rmsd_100=2.691410. Failed P —gcn+--torsion-wrapped-loss,2.657594. Failed Q —gcn+--gcn-residual(best on branch2.514058); all above main best2.461592— no merge tomain. - 2026-04-16: Branch
attempt/default-wrapped-clean-deps: Made wrapped torsion loss default (--torsion-wrapped-lossviaBooleanOptionalAction, default on) and added displacement-domain objective option. Dependency cleanup removed unused packages frompyproject.toml. Validation run (python train.py --sdf reports/trajectories/trajectory_00.sdf --epochs 200 --batch-size 32 --eval-runs 100 --model-type gcn --hidden 256 --gcn-layers 6 --loss-domain displacement --seed 1) reachedmean_rmsd_100=2.528226(no improvement vs best2.461592), so branch not ready to merge. - 2026-04-16: Branch
attempt/default-wrapped-clean-depsupdate: removed--torsion-wrapped-lossCLI toggle and enforced wrapped torsion loss always-on in code. Failed R — stronger baseline (sample.sdf,gcn hidden=512 layers=8, displacement loss,epochs=800,seed=1) reachedmean_rmsd_100=2.512292. - 2026-04-16: Failed S — weighted config (
w_center=0.8, w_omega=2.0, w_torsion=3.0, grad_clip=0.8,epochs=1200,seed=1) reachedmean_rmsd_100=2.507794(better than R, still above best2.461592). - 2026-04-16: Failed T — same weighted config with time-bias (
time_power=1.3) reachedmean_rmsd_100=2.517704; no branch promotion. - 2026-04-16: Attempt U (recommended #1, residual GCN):
--gcn-residualwith weighted displacement setup (epochs=1200,seed=1) reachedmean_rmsd_100=2.463247(close, but above best2.461592). - 2026-04-16: Attempt V (recommended #2, SO(3) geodesic rotation loss): initial full-budget run was too slow, then reduced-budget run (
--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) improved tomean_rmsd_100=2.429729(new branch best at the time). - 2026-04-16: Attempt W (recommended #3, split heads + normalization):
--channel-layernorm --head-mlp-layers 2with weighted displacement setup (epochs=1200,seed=1) reachedmean_rmsd_100=2.634111(degraded). - 2026-04-16: Attempt X (geodesic refinement, longer budget):
--rotation-loss geodesic --epochs=400 --batch-size=24 --seed=2showed NaN instability and reachedmean_rmsd_100=2.552385. - 2026-04-16: Attempt Y (geodesic seed sweep):
--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=3diverged to NaN early and reachedmean_rmsd_100=2.591940. - 2026-04-16: Attempt Z (geodesic stable rerun): same setup as V (
--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) improved further tomean_rmsd_100=2.426296(current best in this branch, better than anchor2.461592). - 2026-04-16: Added train-loss-only early stopping controls (
--early-stop-patience,--early-stop-min-delta,--early-stop-check-every,--early-stop-warmup) withstop_reason/stop_epochreporting in logs andreports/latest_eval.json; objective-metric stopping remains disabled. - 2026-04-16: Attempt AA (merge prep rerun on CUDA): repeated geodesic best-practice config (
--rotation-loss geodesic --epochs=200 --batch-size=24 --seed=1) and measuredmean_rmsd_100=2.429895(num_runs=100), still improving over main anchor2.461592. - 2026-04-16: Branch
attempt/geodesic-stability-next: stress-tested geodesic+residual variants; best observed metric reachedmean_rmsd_100=2.388103(--rotation-loss geodesic --gcn-residual --epochs=280 --batch-size=24 --lr=7e-4 --seed=1), with occasional NaN instability in nearby runs. - 2026-04-16: Stabilization-only update: added non-finite guards/clamps in geodesic loss, Kabsch RMSD, and training loss fallback to reduce NaN-caused crashes during long geodesic sweeps.
- 2026-04-16: Policy update in
GUIDELINES.md: when a branch obtains a strict bestmean_rmsd_100, integration intomainis mandatory before continuing new branch experiments. - 2026-04-16: Hook policy update:
train-performance-gatenow runs at both commit-time andpost-merge, and enforces main-branch merge-time validation/refresh when merged diff includestrain.py. - 2026-04-16: Attempt AB (trajectory-instability hypothesis): added
--omega-max-normclipping to stabilize geodesic+residual rotation outputs and reduce NaN-prone spikes; run with--omega-max-norm 3.0reachedmean_rmsd_100=2.436618(more stable but worse than branch best2.388103). - 2026-04-16: Strategy S1 (hybrid rotation loss, capped at 5 micro-tuning runs) completed: alpha sweep (
0.7/0.5/0.3) then lr/seed tuning on best alpha; best S1 result wasmean_rmsd_100=2.439254(no new best), strategy marked exhausted. - 2026-04-16: Strategy S2 (rotation-weight curriculum, capped at 5 micro-tuning runs) completed: best run used
--rotation-weight-start 1.0 --rotation-weight-warmup-epochs 120and reachedmean_rmsd_100=2.417450(no new best), strategy marked exhausted. - 2026-04-16: Multi-GPU parallel sweep (GPU0/1/2) around residual-geodesic schedules produced
2.394431,2.420601, and2.450024; no update over branch best2.388103. - 2026-04-16: Follow-up parallel sweep (GPU0/1/2) with direct best-axis reruns produced
2.430481,2.412036, and2.720380; observed heavy seed sensitivity and intermittent fallback-to-1000 behavior on unstable seeds. - 2026-04-16: Continued parallel sweep with rotation curriculum variants (
start=0.85/0.95and lower-lr schedule) produced2.450391,2.457748, and2.426384; no improvement over branch best2.388103. - 2026-04-16: Deep schedule parallel sweep (
epochs=320~380,start=1.0with warmup variants, multi-seed) produced2.464117,2.410706, and2.419527; still below branch best and showed late-epoch fallback instability in 일부 runs. - 2026-04-16: Post-reset attempt on
attempt/s3-tail-risk-next(trajectory-tail-risk focus) using residual-geodesic with clipped omega and scheduled rotation weight (lr=6.8e-4,grad_clip=0.7,start=1.0, warmup120) reachedmean_rmsd_100=2.464730; no improvement. - 2026-04-16: Restarted branch
attempt/s3-restart-after-doc-syncand ran immediate S3 continuation (lr=6.0e-4,grad_clip=0.7, geodesic+residual,omega_max_norm=5.0, warmup120), obtainingmean_rmsd_100=2.474573; no improvement over best2.388103. - 2026-04-16: Strategy S4 start (tail-risk suppressor): added upper-quantile tail penalty in training loss and ran first trial (
tail-risk-weight=0.2,tail-risk-quantile=0.85,lr=6.8e-4, geodesic+residual), yieldingmean_rmsd_100=2.466082; no improvement over best2.388103. - 2026-04-16: Strategy S4 micro-tuning #2 lowered tail penalty (
tail-risk-weight=0.1, quantile0.85,lr=6.4e-4) to reduce over-regularization, but result wasmean_rmsd_100=2.476267; no improvement. - 2026-04-16: Strategy S4 micro-tuning #3 softened tail coverage (
tail-risk-quantile=0.9,tail-risk-weight=0.2,lr=6.8e-4) and improved tomean_rmsd_100=2.440570, but still below best2.388103.
Description
Languages
Python
100%