# RL-Framework A small, fast RL framework for training sim2real policies on a 3D-printed rotary (Furuta) cartpole — built to scale from a laptop CPU to a GPU worker (ClearML) without code changes, and to grow into more robots and simulators. ## Architecture Three orthogonal pieces, composed by Hydra config groups: | Piece | Role | Implementations | |---|---|---| | **Env** (`src/envs/`) | Task logic: obs / reward / termination / init distribution. Pure torch, batched, backend-agnostic. | `rotary_cartpole` | | **Runner** (`src/runners/`) | Physics + sim2real plumbing (DR, sensor noise, action delay, history buffer). | `mujoco` (CPU), `mjx` (GPU/JAX), `serial` (real ESP32 robot) | | **Trainer** (`src/training/`) | skrl PPO + shared MLP with optional history encoder. | `ppo`, `ppo_mjx`, `ppo_single`, `ppo_real` | The robot itself is described once in `assets//robot.yaml` (URDF + identified motor model) and shared by **training, sysid and deployment** — the motor model (bias → deadzone → gear compensation, Coulomb + Stribeck friction, viscous damping, first-order lag) is implemented in `src/core/robot.py` and mirrored exactly in the MJX JIT step (`src/runners/mjx.py`). ## Train ```bash # CPU (64 parallel MuJoCo envs) python scripts/train.py env=rotary_cartpole runner=mujoco training=ppo # GPU (1024 MJX envs) — local python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx # GPU — remote on ClearML gpu-queue python scripts/train.py env=rotary_cartpole runner=mjx training=ppo_mjx training.remote=true ``` Videos and scalars stream to ClearML. Checkpoints land in `runs/`. ## Sim2real recipe 1. **Capture** real trajectories: `python -m src.sysid.capture` (writes `.npz` to `assets//recordings/`). 2. **Identify** physics: `python -m src.sysid.optimize --robot-path assets/rotary_cartpole --recording .npz` — CMA-ES fits inertials/joint dynamics against the recording (motor model is locked from the unified sysid). Writes `sysid_result.json` + `robot_tuned.yaml` + `*_tuned.urdf`. 3. **Validate** the fit: `python -m src.sysid.visualize`, then copy `robot_tuned.yaml` → `robot.yaml`. 4. **Train with DR + history**: the runner randomizes friction/damping/torque scales, sensor noise and action latency per episode (`configs/runner/mjx.yaml: domain_rand`), and appends a 10-step (obs, action) history to the observation so the policy can implicitly identify the current dynamics (`history_length`). 5. **Deploy**: `mjpython scripts/eval.py env=rotary_cartpole runner=serial checkpoint=runs//checkpoints/agent_X.pt` ## Other tools ```bash mjpython scripts/viz.py env=rotary_cartpole # keyboard-drive the sim mjpython scripts/viz.py runner=serial # digital twin of the real robot python scripts/hpo.py env=rotary_cartpole training=ppo_single # ClearML + SMAC3 HPO pytest tests/ # unit tests ``` ## Adding a robot / simulator - **Robot**: drop `assets//` (URDF + `robot.yaml`), subclass `BaseEnv` (obs/reward/termination/`initial_state_ranges`), register in `src/core/registry.py`, add `configs/env/.yaml`. - **Simulator**: subclass `BaseRunner` and implement `_sim_initialize`, `_sim_step`, `_sim_reset` (full-batch return) — DR, history and the env-side logic come for free. Register in `scripts/train.py: RUNNER_REGISTRY`.