♻️ full agent refactor

2026-06-10 21:15:34 +02:00
parent a98e86ef66
commit 1e0836e1bc
49 changed files with 1309 additions and 829 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,64 @@
+# RL-Framework
+
+A small, fast RL framework for training sim2real policies on a 3D-printed
+rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker
+(ClearML) without code changes, and to grow into more robots and simulators.
+
+## Architecture
+
+Three orthogonal pieces, composed by Hydra config groups:
+
+| Piece | Role | Implementations |
+|---|---|---|
+| **Env** (`src/envs/`) | Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. | `rotary_cartpole` |
+| **Runner** (`src/runners/`) | Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). | `mujoco` (CPU), `mjx` (GPU/JAX), `serial` (real ESP32 robot) |
+| **Trainer** (`src/training/`) | skrl PPO + shared MLP with optional history encoder. | `ppo`, `ppo_mjx`, `ppo_single`, `ppo_real` |
+
+The robot itself is described once in `assets/<robot>/robot.yaml`
+(URDF + identified motor model) and shared by **training, sysid and
+deployment** — the motor model (bias → deadzone → gear compensation,
+Coulomb + Stribeck friction, viscous damping, first-order lag) is
+implemented in `src/core/robot.py` and mirrored exactly in the MJX JIT
+step (`src/runners/mjx.py`).
+
+## Train
+
+```bash
+# CPU (64 parallel MuJoCo envs)
+python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo
+
+# GPU (1024 MJX envs) — local
+python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx
+
+# GPU — remote on ClearML gpu-queue
+python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true
+```
+
+Videos and scalars stream to ClearML. Checkpoints land in `runs/`.
+
+## Sim2real recipe
+
+1. **Capture** real trajectories: `python -m src.sysid.capture` (writes `.npz` to `assets/<robot>/recordings/`).
+2. **Identify** physics: `python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording <capture>.npz`
+   — CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writes `sysid_result.json` + `robot_tuned.yaml` + `*_tuned.urdf`.
+3. **Validate** the fit: `python -m src.sysid.visualize`, then copy `robot_tuned.yaml` → `robot.yaml`.
+4. **Train with DR + history**: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (`configs/runner/mjx.yaml: domain_rand`), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (`history_length`).
+5. **Deploy**: `mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs/<run>/checkpoints/agent_X.pt`
+
+## Other tools
+
+```bash
+mjpython scripts/viz.py env=rotary_cartpole              # keyboard-drive the sim
+mjpython scripts/viz.py runner=serial                    # digital twin of the real robot
+python scripts/hpo.py env=rotary_cartpole training=ppo_single   # ClearML + SMAC3 HPO
+pytest tests/                                            # unit tests
+```
+
+## Adding a robot / simulator
+
+- **Robot**: drop `assets/<name>/` (URDF + `robot.yaml`), subclass `BaseEnv`
+  (obs/reward/termination/`initial_state_ranges`), register in `src/core/registry.py`,
+  add `configs/env/<name>.yaml`.
+- **Simulator**: subclass `BaseRunner` and implement `_sim_initialize`,
+  `_sim_step`, `_sim_reset` (full-batch return) — DR, history and the
+  env-side logic come for free. Register in `scripts/train.py: RUNNER_REGISTRY`.