RL-Sim-Framework/configs/runner/mujoco_single.yaml at 8cc84d6a213c344ecdb797e320756dbf0c248419 - RL-Sim-Framework - Optimize GIT

VictorMylle/RL-Sim-Framework

Files

Victor Mylle 8cc84d6a21 feat: RMA-style history-conditioned policy for sim2real adaptation

Added a temporal observation history buffer and 1D-CNN encoder so the
policy can implicitly infer environment parameters (mass, friction,
gear ratios, etc.) from recent (obs, action) dynamics.

Architecture:
  history window [(obs₀,a₀), ..., (obs_{H-1},a_{H-1})]
      → 1D-CNN HistoryEncoder → embedding (32-dim)
      → concat [current_obs, embedding] → MLP → action

Components:
- BaseRunner: history ring buffer, _push_history/_reset_history,
  augmented obs space (6 + H×7 = 76 with H=10)
- HistoryEncoder (src/models/mlp.py): 2-layer temporal Conv1d + GAP
- SharedMLP: optional history_length/raw_obs_dim/embedding_dim params;
  splits augmented obs, encodes history, feeds [obs, emb] to MLP
- TrainerConfig: history_length, embedding_dim fields
- All runner configs: history_length=10 by default
- Tests: encoder shape, model with/without history, config defaults

2026-03-28 18:58:24 +01:00

9 lines

209 B

YAML

Raw Blame History

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

 # Single-env MuJoCo runner — mimics real hardware timing.
 # dt × substeps = 0.002 × 10 = 0.02 s → 50 Hz control, same as serial runner.
 num_envs: 1
 device: cpu
 dt: 0.002
 substeps: 10
 history_length: 10