RL-Sim-Framework

Author	SHA1	Message	Date
Victor Mylle	1e0836e1bc	♻️ full agent refactor	2026-06-10 21:15:34 +02:00
Victor Mylle	a98e86ef66	disable JAX GPU preallocation so MJX shares VRAM with torch	2026-06-10 19:48:48 +02:00
Victor Mylle	4210b6cb53	jax[cuda12] on linux for GPU; EGL headless render; non-fatal video	2026-06-10 19:25:33 +02:00
Victor Mylle	a6fbde798a	pin skrl/jax/mujoco/gymnasium versions; custom CUDA base image	2026-06-10 09:05:48 +02:00
Victor Mylle	56499ebe97	feat: full DR (friction/damping/torque) in MJX JIT step	2026-06-09 21:25:05 +02:00
Victor Mylle	b37cd26690	feat: sim2real domain randomization + reward fixes for rotary cartpole Close the sim2real gap for the Furuta pendulum (swings up but can't balance on hardware). Root causes were (a) no domain randomization, so the policy overfit one deterministic sim instance, and (b) reward design flaws that produced degenerate policies. Domain randomization (runner-level, backend-agnostic): - BaseRunner: domain_rand config; per-env action-delay buffer (latency), Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling (friction/damping/torque), resampled per episode. Sensor noise per step. - privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA. - step() now uses clean state for reward/termination, noisy state for the observation the policy sees. - MuJoCoRunner: applies per-env friction/damping/torque scales. - robot.py: compute_motor_force gains friction/damping scale args. - Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean defaults for mujoco_single/serial; noise/delay anchored to recordings. Reward fixes (rotary_cartpole): - Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving always beats ending early (kills the "suicide into the limit" policy). - Add balance_bonus * upright * stillness so reward requires upright AND near-zero pendulum velocity (kills the "spin in full loops" policy). Deploy: - eval.py load_policy reconstructs the history/adaptation encoder (auto-detects its dim from the checkpoint) so DR+embedding policies load. Fixes: - MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which was breaking every rotary-cartpole reset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 20:48:25 +02:00
Victor Mylle	8cc84d6a21	feat: RMA-style history-conditioned policy for sim2real adaptation Added a temporal observation history buffer and 1D-CNN encoder so the policy can implicitly infer environment parameters (mass, friction, gear ratios, etc.) from recent (obs, action) dynamics. Architecture: history window [(obs₀,a₀), ..., (obs_{H-1},a_{H-1})] → 1D-CNN HistoryEncoder → embedding (32-dim) → concat [current_obs, embedding] → MLP → action Components: - BaseRunner: history ring buffer, _push_history/_reset_history, augmented obs space (6 + H×7 = 76 with H=10) - HistoryEncoder (src/models/mlp.py): 2-layer temporal Conv1d + GAP - SharedMLP: optional history_length/raw_obs_dim/embedding_dim params; splits augmented obs, encodes history, feeds [obs, emb] to MLP - TrainerConfig: history_length, embedding_dim fields - All runner configs: history_length=10 by default - Tests: encoder shape, model with/without history, config defaults	2026-03-28 18:58:24 +01:00
Victor Mylle	8ed9afe583	chore: update robot.yaml with unified sysid cost 0.925 All 28 params tuned jointly. Now includes stribeck_friction_boost, stribeck_vel, action_bias. Points to rotary_cartpole_tuned.urdf.	2026-03-28 18:46:45 +01:00
Victor Mylle	5880997786	refactor: merge motor sysid into unified sysid module Unified the two separate sysid codepaths (motor-only and full-system) into a single module that optimizes all 28 parameters jointly: - 13 motor params (asymmetric gear, damping, friction, deadzone, Stribeck boost, action bias, filter tau, armature, ctrl_limit) - 15 pendulum/arm params (mass, CoM, inertia, joint dynamics) Key changes: - Added stribeck_friction_boost, stribeck_vel, action_bias to ActuatorConfig (robot.py) and MJX runner - Created shared src/sysid/preprocess.py (SG velocity recomputation) - Rewrote src/sysid/rollout.py with unified MOTOR_PARAMS + PENDULUM_PARAMS spec and PARAM_SETS dict for flexible subset optimization - Updated optimize.py, export.py, visualize.py to use unified params (removed all LOCKED_MOTOR_PARAMS references) - Removed src/sysid/motor/ module and scripts/motor_sysid.py Net: -1383 lines, zero code duplication between motor and full-system sysid.	2026-03-28 16:48:22 +01:00
Victor Mylle	ca0e7b8b03	✨ clean up lot of stuff	2026-03-22 15:49:13 +01:00
Victor Mylle	d3ed1c25ad	⚗️ experimenting training runs	2026-03-12 00:38:09 +01:00
Victor Mylle	3b2d6d08f9	✨ update hpo	2026-03-11 23:28:39 +01:00
Victor Mylle	23801857f4	♻️ cleanup	2026-03-11 23:16:42 +01:00
Victor Mylle	3db68255f0	✨ update registry	2026-03-11 23:11:21 +01:00
Victor Mylle	1a822bd82e	🐛 bug fixes	2026-03-11 23:07:37 +01:00
Victor Mylle	4115447022	♻️ crazy refactor	2026-03-11 22:52:01 +01:00
Victor Mylle	35223b3560	✨ update motor friction	2026-03-09 23:37:10 +01:00
Victor Mylle	0f13086fee	✨ remove custom ema and use mujoco motor control	2026-03-09 22:47:57 +01:00
Victor Mylle	9813319275	✨ add limit enforce to mujoco for joints	2026-03-09 22:30:48 +01:00
Victor Mylle	70cd2cdd7d	✨ better robot joint loading	2026-03-09 22:17:28 +01:00
Victor Mylle	9be07d9186	✨ add new ppo mjx config	2026-03-09 21:33:42 +01:00
Victor Mylle	26ccb1e902	✨ add mjx runner	2026-03-09 21:18:19 +01:00
Victor Mylle	15da0ef2fd	✨ update urdf and dependencies	2026-03-09 20:39:02 +01:00
Victor Mylle	c753c369b4	✨ add rotary cartpole env	2026-03-08 22:58:32 +01:00
Victor Mylle	c8f28ffbcc	✨ initial commit	2026-03-06 22:19:44 +01:00

25 Commits