Commit Graph

8 Commits

Author SHA1 Message Date
b37cd26690 feat: sim2real domain randomization + reward fixes for rotary cartpole
Close the sim2real gap for the Furuta pendulum (swings up but can't
balance on hardware). Root causes were (a) no domain randomization, so
the policy overfit one deterministic sim instance, and (b) reward design
flaws that produced degenerate policies.

Domain randomization (runner-level, backend-agnostic):
- BaseRunner: domain_rand config; per-env action-delay buffer (latency),
  Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling
  (friction/damping/torque), resampled per episode. Sensor noise per step.
- privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA.
- step() now uses clean state for reward/termination, noisy state for the
  observation the policy sees.
- MuJoCoRunner: applies per-env friction/damping/torque scales.
- robot.py: compute_motor_force gains friction/damping scale args.
- Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean
  defaults for mujoco_single/serial; noise/delay anchored to recordings.

Reward fixes (rotary_cartpole):
- Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving
  always beats ending early (kills the "suicide into the limit" policy).
- Add balance_bonus * upright * stillness so reward requires upright AND
  near-zero pendulum velocity (kills the "spin in full loops" policy).

Deploy:
- eval.py load_policy reconstructs the history/adaptation encoder
  (auto-detects its dim from the checkpoint) so DR+embedding policies load.

Fixes:
- MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which
  was breaking every rotary-cartpole reset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 20:48:25 +02:00
8ed9afe583 chore: update robot.yaml with unified sysid cost 0.925
All 28 params tuned jointly. Now includes stribeck_friction_boost,
stribeck_vel, action_bias. Points to rotary_cartpole_tuned.urdf.
2026-03-28 18:46:45 +01:00
5880997786 refactor: merge motor sysid into unified sysid module
Unified the two separate sysid codepaths (motor-only and full-system)
into a single module that optimizes all 28 parameters jointly:

- 13 motor params (asymmetric gear, damping, friction, deadzone,
  Stribeck boost, action bias, filter tau, armature, ctrl_limit)
- 15 pendulum/arm params (mass, CoM, inertia, joint dynamics)

Key changes:
- Added stribeck_friction_boost, stribeck_vel, action_bias to
  ActuatorConfig (robot.py) and MJX runner
- Created shared src/sysid/preprocess.py (SG velocity recomputation)
- Rewrote src/sysid/rollout.py with unified MOTOR_PARAMS + PENDULUM_PARAMS
  spec and PARAM_SETS dict for flexible subset optimization
- Updated optimize.py, export.py, visualize.py to use unified params
  (removed all LOCKED_MOTOR_PARAMS references)
- Removed src/sysid/motor/ module and scripts/motor_sysid.py

Net: -1383 lines, zero code duplication between motor and full-system sysid.
2026-03-28 16:48:22 +01:00
ca0e7b8b03 clean up lot of stuff 2026-03-22 15:49:13 +01:00
4115447022 ♻️ crazy refactor 2026-03-11 22:52:01 +01:00
35223b3560 update motor friction 2026-03-09 23:37:10 +01:00
0f13086fee remove custom ema and use mujoco motor control 2026-03-09 22:47:57 +01:00
70cd2cdd7d better robot joint loading 2026-03-09 22:17:28 +01:00