Files
RL-Sim-Framework/configs/runner/mujoco.yaml
Victor Mylle 8cc84d6a21 feat: RMA-style history-conditioned policy for sim2real adaptation
Added a temporal observation history buffer and 1D-CNN encoder so the
policy can implicitly infer environment parameters (mass, friction,
gear ratios, etc.) from recent (obs, action) dynamics.

Architecture:
  history window [(obs₀,a₀), ..., (obs_{H-1},a_{H-1})]
      → 1D-CNN HistoryEncoder → embedding (32-dim)
      → concat [current_obs, embedding] → MLP → action

Components:
- BaseRunner: history ring buffer, _push_history/_reset_history,
  augmented obs space (6 + H×7 = 76 with H=10)
- HistoryEncoder (src/models/mlp.py): 2-layer temporal Conv1d + GAP
- SharedMLP: optional history_length/raw_obs_dim/embedding_dim params;
  splits augmented obs, encodes history, feeds [obs, emb] to MLP
- TrainerConfig: history_length, embedding_dim fields
- All runner configs: history_length=10 by default
- Tests: encoder shape, model with/without history, config defaults
2026-03-28 18:58:24 +01:00

15 lines
634 B
YAML

num_envs: 64
device: auto # auto = cuda if available, else cpu
dt: 0.002
substeps: 10
history_length: 10 # RMA-style: 10-step window of (obs, action) pairs
# ── Sim2real: domain randomization ───────────────────────────────
domain_rand:
mass_frac: 0.15 # ±15% body mass randomization
friction_frac: 0.3 # ±30% joint friction
damping_frac: 0.3 # ±30% joint damping
armature_frac: 0.2 # ±20% reflected rotor inertia
gear_frac: 0.15 # ±15% actuator gear ratio
com_offset: 0.005 # ±5mm center-of-mass shift