Commit Graph

25 Commits

Author SHA1 Message Date
1e0836e1bc ♻️ full agent refactor 2026-06-10 21:15:34 +02:00
a98e86ef66 disable JAX GPU preallocation so MJX shares VRAM with torch 2026-06-10 19:48:48 +02:00
4210b6cb53 jax[cuda12] on linux for GPU; EGL headless render; non-fatal video 2026-06-10 19:25:33 +02:00
a6fbde798a pin skrl/jax/mujoco/gymnasium versions; custom CUDA base image 2026-06-10 09:05:48 +02:00
56499ebe97 feat: full DR (friction/damping/torque) in MJX JIT step 2026-06-09 21:25:05 +02:00
b37cd26690 feat: sim2real domain randomization + reward fixes for rotary cartpole
Close the sim2real gap for the Furuta pendulum (swings up but can't
balance on hardware). Root causes were (a) no domain randomization, so
the policy overfit one deterministic sim instance, and (b) reward design
flaws that produced degenerate policies.

Domain randomization (runner-level, backend-agnostic):
- BaseRunner: domain_rand config; per-env action-delay buffer (latency),
  Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling
  (friction/damping/torque), resampled per episode. Sensor noise per step.
- privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA.
- step() now uses clean state for reward/termination, noisy state for the
  observation the policy sees.
- MuJoCoRunner: applies per-env friction/damping/torque scales.
- robot.py: compute_motor_force gains friction/damping scale args.
- Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean
  defaults for mujoco_single/serial; noise/delay anchored to recordings.

Reward fixes (rotary_cartpole):
- Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving
  always beats ending early (kills the "suicide into the limit" policy).
- Add balance_bonus * upright * stillness so reward requires upright AND
  near-zero pendulum velocity (kills the "spin in full loops" policy).

Deploy:
- eval.py load_policy reconstructs the history/adaptation encoder
  (auto-detects its dim from the checkpoint) so DR+embedding policies load.

Fixes:
- MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which
  was breaking every rotary-cartpole reset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 20:48:25 +02:00
8cc84d6a21 feat: RMA-style history-conditioned policy for sim2real adaptation
Added a temporal observation history buffer and 1D-CNN encoder so the
policy can implicitly infer environment parameters (mass, friction,
gear ratios, etc.) from recent (obs, action) dynamics.

Architecture:
  history window [(obs₀,a₀), ..., (obs_{H-1},a_{H-1})]
      → 1D-CNN HistoryEncoder → embedding (32-dim)
      → concat [current_obs, embedding] → MLP → action

Components:
- BaseRunner: history ring buffer, _push_history/_reset_history,
  augmented obs space (6 + H×7 = 76 with H=10)
- HistoryEncoder (src/models/mlp.py): 2-layer temporal Conv1d + GAP
- SharedMLP: optional history_length/raw_obs_dim/embedding_dim params;
  splits augmented obs, encodes history, feeds [obs, emb] to MLP
- TrainerConfig: history_length, embedding_dim fields
- All runner configs: history_length=10 by default
- Tests: encoder shape, model with/without history, config defaults
2026-03-28 18:58:24 +01:00
8ed9afe583 chore: update robot.yaml with unified sysid cost 0.925
All 28 params tuned jointly. Now includes stribeck_friction_boost,
stribeck_vel, action_bias. Points to rotary_cartpole_tuned.urdf.
2026-03-28 18:46:45 +01:00
5880997786 refactor: merge motor sysid into unified sysid module
Unified the two separate sysid codepaths (motor-only and full-system)
into a single module that optimizes all 28 parameters jointly:

- 13 motor params (asymmetric gear, damping, friction, deadzone,
  Stribeck boost, action bias, filter tau, armature, ctrl_limit)
- 15 pendulum/arm params (mass, CoM, inertia, joint dynamics)

Key changes:
- Added stribeck_friction_boost, stribeck_vel, action_bias to
  ActuatorConfig (robot.py) and MJX runner
- Created shared src/sysid/preprocess.py (SG velocity recomputation)
- Rewrote src/sysid/rollout.py with unified MOTOR_PARAMS + PENDULUM_PARAMS
  spec and PARAM_SETS dict for flexible subset optimization
- Updated optimize.py, export.py, visualize.py to use unified params
  (removed all LOCKED_MOTOR_PARAMS references)
- Removed src/sysid/motor/ module and scripts/motor_sysid.py

Net: -1383 lines, zero code duplication between motor and full-system sysid.
2026-03-28 16:48:22 +01:00
ca0e7b8b03 clean up lot of stuff 2026-03-22 15:49:13 +01:00
d3ed1c25ad ⚗️ experimenting training runs 2026-03-12 00:38:09 +01:00
3b2d6d08f9 update hpo 2026-03-11 23:28:39 +01:00
23801857f4 ♻️ cleanup 2026-03-11 23:16:42 +01:00
3db68255f0 update registry 2026-03-11 23:11:21 +01:00
1a822bd82e 🐛 bug fixes 2026-03-11 23:07:37 +01:00
4115447022 ♻️ crazy refactor 2026-03-11 22:52:01 +01:00
35223b3560 update motor friction 2026-03-09 23:37:10 +01:00
0f13086fee remove custom ema and use mujoco motor control 2026-03-09 22:47:57 +01:00
9813319275 add limit enforce to mujoco for joints 2026-03-09 22:30:48 +01:00
70cd2cdd7d better robot joint loading 2026-03-09 22:17:28 +01:00
9be07d9186 add new ppo mjx config 2026-03-09 21:33:42 +01:00
26ccb1e902 add mjx runner 2026-03-09 21:18:19 +01:00
15da0ef2fd update urdf and dependencies 2026-03-09 20:39:02 +01:00
c753c369b4 add rotary cartpole env 2026-03-08 22:58:32 +01:00
c8f28ffbcc initial commit 2026-03-06 22:19:44 +01:00