3.3 KiB
3.3 KiB
RL-Framework
A small, fast RL framework for training sim2real policies on a 3D-printed rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker (ClearML) without code changes, and to grow into more robots and simulators.
Architecture
Three orthogonal pieces, composed by Hydra config groups:
| Piece | Role | Implementations |
|---|---|---|
Env (src/envs/) |
Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. | rotary_cartpole |
Runner (src/runners/) |
Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). | mujoco (CPU), mjx (GPU/JAX), serial (real ESP32 robot) |
Trainer (src/training/) |
skrl PPO + shared MLP with optional history encoder. | ppo, ppo_mjx, ppo_single, ppo_real |
The robot itself is described once in assets/<robot>/robot.yaml
(URDF + identified motor model) and shared by training, sysid and
deployment — the motor model (bias → deadzone → gear compensation,
Coulomb + Stribeck friction, viscous damping, first-order lag) is
implemented in src/core/robot.py and mirrored exactly in the MJX JIT
step (src/runners/mjx.py).
Train
# CPU (64 parallel MuJoCo envs)
python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo
# GPU (1024 MJX envs) — local
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx
# GPU — remote on ClearML gpu-queue
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true
Videos and scalars stream to ClearML. Checkpoints land in runs/.
Sim2real recipe
- Capture real trajectories:
python -m src.sysid.capture(writes.npztoassets/<robot>/recordings/). - Identify physics:
python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording <capture>.npz— CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writessysid_result.json+robot_tuned.yaml+*_tuned.urdf. - Validate the fit:
python -m src.sysid.visualize, then copyrobot_tuned.yaml→robot.yaml. - Train with DR + history: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (
configs/runner/mjx.yaml: domain_rand), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (history_length). - Deploy:
mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs/<run>/checkpoints/agent_X.pt
Other tools
mjpython scripts/viz.py env=rotary_cartpole # keyboard-drive the sim
mjpython scripts/viz.py runner=serial # digital twin of the real robot
python scripts/hpo.py env=rotary_cartpole training=ppo_single # ClearML + SMAC3 HPO
pytest tests/ # unit tests
Adding a robot / simulator
- Robot: drop
assets/<name>/(URDF +robot.yaml), subclassBaseEnv(obs/reward/termination/initial_state_ranges), register insrc/core/registry.py, addconfigs/env/<name>.yaml. - Simulator: subclass
BaseRunnerand implement_sim_initialize,_sim_step,_sim_reset(full-batch return) — DR, history and the env-side logic come for free. Register inscripts/train.py: RUNNER_REGISTRY.