Files

demian3b e7274cb680 Document multi-layer diagnosis workflow.

Update project guidelines and attempt log to require multi-view analysis (metrics, trajectory behavior, structural diagnostics) instead of metric-only decision making.

Made-with: Cursor

2026-04-16 17:28:46 +09:00

3.2 KiB

Raw Blame History

ai_rfm

RFM overfitting sandbox for a single ligand sample, with hard quality gates.

Environment first (UV, cu126 only)

Ensure Python 3.10 is available.
Install env and deps:
- uv sync
Install git hooks:
- uv run pre-commit install

This repository is intentionally pinned to CUDA 12.6 PyTorch wheels and matching PyG wheels.

Repository policy

Every attempt must update this README (append a short entry in ## Attempt Log).
Flow-matching training time must stay random (middle-time supervision is mandatory).
Commits touching train.py must include:
- reports/latest_eval.json
- strictly better mean_rmsd_100 compared to previous best (enforced by pre-commit).
- BEST_PRACTICE.json is auto-updated and staged by pre-commit.
- best checkpoint and trajectory artifacts are auto-regenerated by pre-commit.

Evaluation target

Metric: mean RMSD over 100 runs (batchsize=100 style aggregated evaluation).
Success criterion: mean_rmsd_100 <= 1.0.

Key files

train.py: training/evaluation entry point.
GUIDELINES.md: operating rules and workflow.
BEST_PRACTICE.json: current best-known metric and config.
reports/latest_eval.json: most recent measured metric.
artifacts/best_model.pt: best checkpoint from latest improved run.
reports/trajectories/: 6 regenerated trajectories from current best model.
scripts/precommit_performance_gate.py: pre-commit guard for train-related commits.

Attempt Log

2026-04-16: Bootstrapped docs/environment policy and cu126 UV config. Added best-practice/performance gating scaffolding before the next training run.
2026-04-16: Updated train.py to use final test metric as source of truth (mean_rmsd_100 from 100 rollout predictions) and removed train-loss based best checkpoint tracking. Current measured mean_rmsd_100=2.593694.
2026-04-16: Updated evaluation to always use the best training checkpoint, then run 100 random initializations to time=1 and store the final RMSD mean in reports/latest_eval.json.
2026-04-16: Re-ran with best-checkpoint evaluation path active; current mean_rmsd_100=2.582932 (improved from 2.593694), artifacts synced to BEST_PRACTICE.json.
2026-04-16: Moved BEST_PRACTICE.json updates out of train.py; pre-commit now auto-generates/stages best report from reports/latest_eval.json when an improved train.py commit is made.
2026-04-16: Re-ran after pre-commit auto-best refactor; current mean_rmsd_100=2.570120 (improved from 2.582932).
2026-04-16: Added model-type support (gcn/mlp) and time-sampling control; best current run is gcn hidden=512 layers=8 batch=96 with mean_rmsd_100=2.523552.
2026-04-16: Added pre-commit artifact refresh: on best update it now stages BEST_PRACTICE.json, artifacts/best_model.pt, and regenerates 6 trajectory visualizations in reports/trajectories/.
2026-04-16: Enforced random-time flow-matching rule (no fixed training time), saved best checkpoint to git-tracked artifact path, and improved metric to mean_rmsd_100=2.519821 with gcn hidden=512 layers=8 batch=96.
2026-04-16: Added a general multi-layer diagnosis principle to GUIDELINES.md so experiments are judged with quantitative + qualitative + structural evidence, not metric-only optimization.

3.2 KiB Raw Blame History