♻️ full agent refactor

This commit is contained in:
2026-06-10 21:15:34 +02:00
parent a98e86ef66
commit 1e0836e1bc
49 changed files with 1309 additions and 829 deletions

64
README.md Normal file
View File

@@ -0,0 +1,64 @@
# RL-Framework
A small, fast RL framework for training sim2real policies on a 3D-printed
rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker
(ClearML) without code changes, and to grow into more robots and simulators.
## Architecture
Three orthogonal pieces, composed by Hydra config groups:
| Piece | Role | Implementations |
|---|---|---|
| **Env** (`src/envs/`) | Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. | `rotary_cartpole` |
| **Runner** (`src/runners/`) | Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). | `mujoco` (CPU), `mjx` (GPU/JAX), `serial` (real ESP32 robot) |
| **Trainer** (`src/training/`) | skrl PPO + shared MLP with optional history encoder. | `ppo`, `ppo_mjx`, `ppo_single`, `ppo_real` |
The robot itself is described once in `assets/<robot>/robot.yaml`
(URDF + identified motor model) and shared by **training, sysid and
deployment** — the motor model (bias → deadzone → gear compensation,
Coulomb + Stribeck friction, viscous damping, first-order lag) is
implemented in `src/core/robot.py` and mirrored exactly in the MJX JIT
step (`src/runners/mjx.py`).
## Train
```bash
# CPU (64 parallel MuJoCo envs)
python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo
# GPU (1024 MJX envs) — local
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx
# GPU — remote on ClearML gpu-queue
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true
```
Videos and scalars stream to ClearML. Checkpoints land in `runs/`.
## Sim2real recipe
1. **Capture** real trajectories: `python -m src.sysid.capture` (writes `.npz` to `assets/<robot>/recordings/`).
2. **Identify** physics: `python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording <capture>.npz`
— CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writes `sysid_result.json` + `robot_tuned.yaml` + `*_tuned.urdf`.
3. **Validate** the fit: `python -m src.sysid.visualize`, then copy `robot_tuned.yaml``robot.yaml`.
4. **Train with DR + history**: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (`configs/runner/mjx.yaml: domain_rand`), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (`history_length`).
5. **Deploy**: `mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs/<run>/checkpoints/agent_X.pt`
## Other tools
```bash
mjpython scripts/viz.py env=rotary_cartpole # keyboard-drive the sim
mjpython scripts/viz.py runner=serial # digital twin of the real robot
python scripts/hpo.py env=rotary_cartpole training=ppo_single # ClearML + SMAC3 HPO
pytest tests/ # unit tests
```
## Adding a robot / simulator
- **Robot**: drop `assets/<name>/` (URDF + `robot.yaml`), subclass `BaseEnv`
(obs/reward/termination/`initial_state_ranges`), register in `src/core/registry.py`,
add `configs/env/<name>.yaml`.
- **Simulator**: subclass `BaseRunner` and implement `_sim_initialize`,
`_sim_step`, `_sim_reset` (full-batch return) — DR, history and the
env-side logic come for free. Register in `scripts/train.py: RUNNER_REGISTRY`.