Files
RL-Sim-Framework/README.md
2026-06-10 21:15:34 +02:00

3.3 KiB

RL-Framework

A small, fast RL framework for training sim2real policies on a 3D-printed rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker (ClearML) without code changes, and to grow into more robots and simulators.

Architecture

Three orthogonal pieces, composed by Hydra config groups:

Piece Role Implementations
Env (src/envs/) Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. rotary_cartpole
Runner (src/runners/) Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). mujoco (CPU), mjx (GPU/JAX), serial (real ESP32 robot)
Trainer (src/training/) skrl PPO + shared MLP with optional history encoder. ppo, ppo_mjx, ppo_single, ppo_real

The robot itself is described once in assets/<robot>/robot.yaml (URDF + identified motor model) and shared by training, sysid and deployment — the motor model (bias → deadzone → gear compensation, Coulomb + Stribeck friction, viscous damping, first-order lag) is implemented in src/core/robot.py and mirrored exactly in the MJX JIT step (src/runners/mjx.py).

Train

# CPU (64 parallel MuJoCo envs)
python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo

# GPU (1024 MJX envs) — local
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx

# GPU — remote on ClearML gpu-queue
python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true

Videos and scalars stream to ClearML. Checkpoints land in runs/.

Sim2real recipe

  1. Capture real trajectories: python -m src.sysid.capture (writes .npz to assets/<robot>/recordings/).
  2. Identify physics: python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording <capture>.npz — CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writes sysid_result.json + robot_tuned.yaml + *_tuned.urdf.
  3. Validate the fit: python -m src.sysid.visualize, then copy robot_tuned.yamlrobot.yaml.
  4. Train with DR + history: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (configs/runner/mjx.yaml: domain_rand), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (history_length).
  5. Deploy: mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs/<run>/checkpoints/agent_X.pt

Other tools

mjpython scripts/viz.py env=rotary_cartpole              # keyboard-drive the sim
mjpython scripts/viz.py runner=serial                    # digital twin of the real robot
python scripts/hpo.py env=rotary_cartpole training=ppo_single   # ClearML + SMAC3 HPO
pytest tests/                                            # unit tests

Adding a robot / simulator

  • Robot: drop assets/<name>/ (URDF + robot.yaml), subclass BaseEnv (obs/reward/termination/initial_state_ranges), register in src/core/registry.py, add configs/env/<name>.yaml.
  • Simulator: subclass BaseRunner and implement _sim_initialize, _sim_step, _sim_reset (full-batch return) — DR, history and the env-side logic come for free. Register in scripts/train.py: RUNNER_REGISTRY.