65 lines
3.3 KiB
Markdown
65 lines
3.3 KiB
Markdown
# RL-Framework
|
|
|
|
A small, fast RL framework for training sim2real policies on a 3D-printed
|
|
rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker
|
|
(ClearML) without code changes, and to grow into more robots and simulators.
|
|
|
|
## Architecture
|
|
|
|
Three orthogonal pieces, composed by Hydra config groups:
|
|
|
|
| Piece | Role | Implementations |
|
|
|---|---|---|
|
|
| **Env** (`src/envs/`) | Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. | `rotary_cartpole` |
|
|
| **Runner** (`src/runners/`) | Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). | `mujoco` (CPU), `mjx` (GPU/JAX), `serial` (real ESP32 robot) |
|
|
| **Trainer** (`src/training/`) | skrl PPO + shared MLP with optional history encoder. | `ppo`, `ppo_mjx`, `ppo_single`, `ppo_real` |
|
|
|
|
The robot itself is described once in `assets/<robot>/robot.yaml`
|
|
(URDF + identified motor model) and shared by **training, sysid and
|
|
deployment** — the motor model (bias → deadzone → gear compensation,
|
|
Coulomb + Stribeck friction, viscous damping, first-order lag) is
|
|
implemented in `src/core/robot.py` and mirrored exactly in the MJX JIT
|
|
step (`src/runners/mjx.py`).
|
|
|
|
## Train
|
|
|
|
```bash
|
|
# CPU (64 parallel MuJoCo envs)
|
|
python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo
|
|
|
|
# GPU (1024 MJX envs) — local
|
|
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx
|
|
|
|
# GPU — remote on ClearML gpu-queue
|
|
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true
|
|
```
|
|
|
|
Videos and scalars stream to ClearML. Checkpoints land in `runs/`.
|
|
|
|
## Sim2real recipe
|
|
|
|
1. **Capture** real trajectories: `python -m src.sysid.capture` (writes `.npz` to `assets/<robot>/recordings/`).
|
|
2. **Identify** physics: `python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording <capture>.npz`
|
|
— CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writes `sysid_result.json` + `robot_tuned.yaml` + `*_tuned.urdf`.
|
|
3. **Validate** the fit: `python -m src.sysid.visualize`, then copy `robot_tuned.yaml` → `robot.yaml`.
|
|
4. **Train with DR + history**: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (`configs/runner/mjx.yaml: domain_rand`), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (`history_length`).
|
|
5. **Deploy**: `mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs/<run>/checkpoints/agent_X.pt`
|
|
|
|
## Other tools
|
|
|
|
```bash
|
|
mjpython scripts/viz.py env=rotary_cartpole # keyboard-drive the sim
|
|
mjpython scripts/viz.py runner=serial # digital twin of the real robot
|
|
python scripts/hpo.py env=rotary_cartpole training=ppo_single # ClearML + SMAC3 HPO
|
|
pytest tests/ # unit tests
|
|
```
|
|
|
|
## Adding a robot / simulator
|
|
|
|
- **Robot**: drop `assets/<name>/` (URDF + `robot.yaml`), subclass `BaseEnv`
|
|
(obs/reward/termination/`initial_state_ranges`), register in `src/core/registry.py`,
|
|
add `configs/env/<name>.yaml`.
|
|
- **Simulator**: subclass `BaseRunner` and implement `_sim_initialize`,
|
|
`_sim_step`, `_sim_reset` (full-batch return) — DR, history and the
|
|
env-side logic come for free. Register in `scripts/train.py: RUNNER_REGISTRY`.
|