Update project guidelines and attempt log to require multi-view analysis (metrics, trajectory behavior, structural diagnostics) instead of metric-only decision making. Made-with: Cursor
3.2 KiB
3.2 KiB
ai_rfm
RFM overfitting sandbox for a single ligand sample, with hard quality gates.
Environment first (UV, cu126 only)
- Ensure Python 3.10 is available.
- Install env and deps:
uv sync
- Install git hooks:
uv run pre-commit install
This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching PyG wheels.
Repository policy
- Every attempt must update this README (append a short entry in
## Attempt Log). - Flow-matching training time must stay random (middle-time supervision is mandatory).
- Commits touching
train.pymust include:reports/latest_eval.json- strictly better
mean_rmsd_100compared to previous best (enforced by pre-commit). BEST_PRACTICE.jsonis auto-updated and staged by pre-commit.- best checkpoint and trajectory artifacts are auto-regenerated by pre-commit.
Evaluation target
- Metric: mean RMSD over 100 runs (
batchsize=100style aggregated evaluation). - Success criterion:
mean_rmsd_100 <= 1.0.
Key files
train.py: training/evaluation entry point.GUIDELINES.md: operating rules and workflow.BEST_PRACTICE.json: current best-known metric and config.reports/latest_eval.json: most recent measured metric.artifacts/best_model.pt: best checkpoint from latest improved run.reports/trajectories/: 6 regenerated trajectories from current best model.scripts/precommit_performance_gate.py: pre-commit guard for train-related commits.
Attempt Log
- 2026-04-16: Bootstrapped docs/environment policy and cu126 UV config. Added best-practice/performance gating scaffolding before the next training run.
- 2026-04-16: Updated
train.pyto use final test metric as source of truth (mean_rmsd_100from 100 rollout predictions) and removed train-loss based best checkpoint tracking. Current measuredmean_rmsd_100=2.593694. - 2026-04-16: Updated evaluation to always use the best training checkpoint, then run 100 random initializations to time=1 and store the final RMSD mean in
reports/latest_eval.json. - 2026-04-16: Re-ran with best-checkpoint evaluation path active; current
mean_rmsd_100=2.582932(improved from2.593694), artifacts synced toBEST_PRACTICE.json. - 2026-04-16: Moved
BEST_PRACTICE.jsonupdates out oftrain.py; pre-commit now auto-generates/stages best report fromreports/latest_eval.jsonwhen an improved train.py commit is made. - 2026-04-16: Re-ran after pre-commit auto-best refactor; current
mean_rmsd_100=2.570120(improved from2.582932). - 2026-04-16: Added model-type support (
gcn/mlp) and time-sampling control; best current run isgcn hidden=512 layers=8 batch=96withmean_rmsd_100=2.523552. - 2026-04-16: Added pre-commit artifact refresh: on best update it now stages
BEST_PRACTICE.json,artifacts/best_model.pt, and regenerates 6 trajectory visualizations inreports/trajectories/. - 2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to
mean_rmsd_100=2.519821withgcn hidden=512 layers=8 batch=96. - 2026-04-16: Added a general multi-layer diagnosis principle to
GUIDELINES.mdso experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization.