RL-Sim-Framework

Author	SHA1	Message	Date
Victor Mylle	b37cd26690	feat: sim2real domain randomization + reward fixes for rotary cartpole Close the sim2real gap for the Furuta pendulum (swings up but can't balance on hardware). Root causes were (a) no domain randomization, so the policy overfit one deterministic sim instance, and (b) reward design flaws that produced degenerate policies. Domain randomization (runner-level, backend-agnostic): - BaseRunner: domain_rand config; per-env action-delay buffer (latency), Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling (friction/damping/torque), resampled per episode. Sensor noise per step. - privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA. - step() now uses clean state for reward/termination, noisy state for the observation the policy sees. - MuJoCoRunner: applies per-env friction/damping/torque scales. - robot.py: compute_motor_force gains friction/damping scale args. - Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean defaults for mujoco_single/serial; noise/delay anchored to recordings. Reward fixes (rotary_cartpole): - Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving always beats ending early (kills the "suicide into the limit" policy). - Add balance_bonus * upright * stillness so reward requires upright AND near-zero pendulum velocity (kills the "spin in full loops" policy). Deploy: - eval.py load_policy reconstructs the history/adaptation encoder (auto-detects its dim from the checkpoint) so DR+embedding policies load. Fixes: - MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which was breaking every rotary-cartpole reset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:48:25 +02:00
Victor Mylle	ca0e7b8b03	✨ clean up lot of stuff	2026-03-22 15:49:13 +01:00
Victor Mylle	4115447022	♻️ crazy refactor	2026-03-11 22:52:01 +01:00
Victor Mylle	70cd2cdd7d	✨ better robot joint loading	2026-03-09 22:17:28 +01:00
Victor Mylle	15da0ef2fd	✨ update urdf and dependencies	2026-03-09 20:39:02 +01:00
Victor Mylle	c753c369b4	✨ add rotary cartpole env	2026-03-08 22:58:32 +01:00
Victor Mylle	c8f28ffbcc	✨ initial commit	2026-03-06 22:19:44 +01:00

7 Commits