RL-Sim-Framework

Author	SHA1	Message	Date
Victor Mylle	b37cd26690	feat: sim2real domain randomization + reward fixes for rotary cartpole Close the sim2real gap for the Furuta pendulum (swings up but can't balance on hardware). Root causes were (a) no domain randomization, so the policy overfit one deterministic sim instance, and (b) reward design flaws that produced degenerate policies. Domain randomization (runner-level, backend-agnostic): - BaseRunner: domain_rand config; per-env action-delay buffer (latency), Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling (friction/damping/torque), resampled per episode. Sensor noise per step. - privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA. - step() now uses clean state for reward/termination, noisy state for the observation the policy sees. - MuJoCoRunner: applies per-env friction/damping/torque scales. - robot.py: compute_motor_force gains friction/damping scale args. - Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean defaults for mujoco_single/serial; noise/delay anchored to recordings. Reward fixes (rotary_cartpole): - Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving always beats ending early (kills the "suicide into the limit" policy). - Add balance_bonus * upright * stillness so reward requires upright AND near-zero pendulum velocity (kills the "spin in full loops" policy). Deploy: - eval.py load_policy reconstructs the history/adaptation encoder (auto-detects its dim from the checkpoint) so DR+embedding policies load. Fixes: - MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which was breaking every rotary-cartpole reset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:48:25 +02:00
Victor Mylle	8ed9afe583	chore: update robot.yaml with unified sysid cost 0.925 All 28 params tuned jointly. Now includes stribeck_friction_boost, stribeck_vel, action_bias. Points to rotary_cartpole_tuned.urdf.	2026-03-28 18:46:45 +01:00
Victor Mylle	5880997786	refactor: merge motor sysid into unified sysid module Unified the two separate sysid codepaths (motor-only and full-system) into a single module that optimizes all 28 parameters jointly: - 13 motor params (asymmetric gear, damping, friction, deadzone, Stribeck boost, action bias, filter tau, armature, ctrl_limit) - 15 pendulum/arm params (mass, CoM, inertia, joint dynamics) Key changes: - Added stribeck_friction_boost, stribeck_vel, action_bias to ActuatorConfig (robot.py) and MJX runner - Created shared src/sysid/preprocess.py (SG velocity recomputation) - Rewrote src/sysid/rollout.py with unified MOTOR_PARAMS + PENDULUM_PARAMS spec and PARAM_SETS dict for flexible subset optimization - Updated optimize.py, export.py, visualize.py to use unified params (removed all LOCKED_MOTOR_PARAMS references) - Removed src/sysid/motor/ module and scripts/motor_sysid.py Net: -1383 lines, zero code duplication between motor and full-system sysid.	2026-03-28 16:48:22 +01:00
Victor Mylle	ca0e7b8b03	✨ clean up lot of stuff	2026-03-22 15:49:13 +01:00
Victor Mylle	4115447022	♻️ crazy refactor	2026-03-11 22:52:01 +01:00
Victor Mylle	35223b3560	✨ update motor friction	2026-03-09 23:37:10 +01:00
Victor Mylle	0f13086fee	✨ remove custom ema and use mujoco motor control	2026-03-09 22:47:57 +01:00
Victor Mylle	70cd2cdd7d	✨ better robot joint loading	2026-03-09 22:17:28 +01:00

8 Commits