feat: sim2real domain randomization + reward fixes for rotary cartpole
Close the sim2real gap for the Furuta pendulum (swings up but can't balance on hardware). Root causes were (a) no domain randomization, so the policy overfit one deterministic sim instance, and (b) reward design flaws that produced degenerate policies. Domain randomization (runner-level, backend-agnostic): - BaseRunner: domain_rand config; per-env action-delay buffer (latency), Gaussian qpos/qvel sensor noise, per-env dynamics-scale sampling (friction/damping/torque), resampled per episode. Sensor noise per step. - privileged_obs/privileged_dim expose normalized DR factors (mu) for RMA. - step() now uses clean state for reward/termination, noisy state for the observation the policy sees. - MuJoCoRunner: applies per-env friction/damping/torque scales. - robot.py: compute_motor_force gains friction/damping scale args. - Configs: DR blocks for mujoco (full) and mjx (delay+noise); clean defaults for mujoco_single/serial; noise/delay anchored to recordings. Reward fixes (rotary_cartpole): - Shift upright reward to [0,1] (was [-1,1]) + alive_bonus, so surviving always beats ending early (kills the "suicide into the limit" policy). - Add balance_bonus * upright * stillness so reward requires upright AND near-zero pendulum velocity (kills the "spin in full loops" policy). Deploy: - eval.py load_policy reconstructs the history/adaptation encoder (auto-detects its dim from the checkpoint) so DR+embedding policies load. Fixes: - MuJoCoRunner._sim_reset referenced self._env (typo) -> self.env, which was breaking every rotary-cartpole reset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -5,6 +5,7 @@
|
|||||||
outputs/
|
outputs/
|
||||||
runs/
|
runs/
|
||||||
smac3_output/
|
smac3_output/
|
||||||
|
training_log.txt
|
||||||
|
|
||||||
# MuJoCo
|
# MuJoCo
|
||||||
MUJOCO_LOG.TXT
|
MUJOCO_LOG.TXT
|
||||||
|
|||||||
@@ -1,23 +1,21 @@
|
|||||||
# Tuned robot config — generated by src.sysid (2026-03-28)
|
# Tuned robot config — generated by src.sysid.optimize
|
||||||
# Unified sysid: cost 0.925 (28 params, 0.2s windows, amplitude 200)
|
|
||||||
|
|
||||||
urdf: rotary_cartpole_tuned.urdf
|
urdf: rotary_cartpole.urdf
|
||||||
actuators:
|
actuators:
|
||||||
- joint: motor_joint
|
- joint: motor_joint
|
||||||
type: motor
|
type: motor
|
||||||
gear: [0.846499, 1.183733] # torque constant [pos, neg]
|
gear: [0.424182, 0.425031] # torque constant [pos, neg] (motor sysid)
|
||||||
ctrl_range: [-0.686251, 0.686251] # effective control bound
|
ctrl_range: [-0.592, 0.592] # effective control bound (sysid-tuned)
|
||||||
deadzone: [0.181097, 0.202072] # L298N min |ctrl| for torque [pos, neg]
|
deadzone: [0.141291, 0.078015] # L298N min |ctrl| for torque [pos, neg]
|
||||||
damping: [0.013165, 0.015452] # viscous damping [pos, neg]
|
damping: [0.002027, 0.014665] # viscous damping [pos, neg]
|
||||||
frictionloss: [0.014244, 0.001005] # Coulomb friction [pos, neg]
|
frictionloss: [0.057328, 0.053355] # Coulomb friction [pos, neg]
|
||||||
filter_tau: 0.096263 # 1st-order actuator filter
|
filter_tau: 0.005035 # 1st-order actuator filter (motor sysid)
|
||||||
stribeck_friction_boost: 0.068594 # low-velocity friction boost
|
viscous_quadratic: 0.000285 # velocity² drag
|
||||||
stribeck_vel: 5.279594 # Stribeck velocity scale (rad/s)
|
back_emf_gain: 0.006758 # back-EMF torque reduction
|
||||||
action_bias: 0.056566 # asymmetric bias offset
|
|
||||||
joints:
|
joints:
|
||||||
motor_joint:
|
motor_joint:
|
||||||
armature: 0.001676 # reflected rotor inertia
|
armature: 0.002773 # reflected rotor inertia (motor sysid)
|
||||||
frictionloss: 0.0 # handled by motor model via qfrc_applied
|
frictionloss: 0.0 # handled by motor model via qfrc_applied
|
||||||
pendulum_joint:
|
pendulum_joint:
|
||||||
damping: 1.2e-05 # pendulum pivot damping
|
damping: 0.000119
|
||||||
frictionloss: 7.2e-05 # pendulum pivot friction
|
frictionloss: 1.0e-05
|
||||||
|
|||||||
7
configs/env/rotary_cartpole.yaml
vendored
7
configs/env/rotary_cartpole.yaml
vendored
@@ -1,12 +1,18 @@
|
|||||||
max_steps: 1000
|
max_steps: 1000
|
||||||
robot_path: assets/rotary_cartpole
|
robot_path: assets/rotary_cartpole
|
||||||
reward_upright_scale: 1.0
|
reward_upright_scale: 1.0
|
||||||
|
alive_bonus: 0.25 # per-step survival bonus (living must beat dying)
|
||||||
|
balance_bonus: 2.0 # extra reward for upright AND still (beats spinning)
|
||||||
|
balance_vel_scale: 0.5 # how fast the balance bonus decays with pendulum speed
|
||||||
|
|
||||||
# ── Regularisation penalties (prevent fast spinning) ─────────────────
|
# ── Regularisation penalties (prevent fast spinning) ─────────────────
|
||||||
motor_vel_penalty: 0.01 # penalise high motor angular velocity
|
motor_vel_penalty: 0.01 # penalise high motor angular velocity
|
||||||
motor_angle_penalty: 0.05 # penalise deviation from centre
|
motor_angle_penalty: 0.05 # penalise deviation from centre
|
||||||
action_penalty: 0.05 # penalise large actions (energy cost)
|
action_penalty: 0.05 # penalise large actions (energy cost)
|
||||||
|
|
||||||
|
# ── Initial state randomisation ──────────────────────────────────────
|
||||||
|
pendulum_init_range_deg: 180.0 # pendulum starts in [-180°, +180°]
|
||||||
|
|
||||||
# ── Software safety limit (env-level, always applied) ────────────────
|
# ── Software safety limit (env-level, always applied) ────────────────
|
||||||
motor_angle_limit_deg: 90.0 # terminate episode if motor exceeds ±90°
|
motor_angle_limit_deg: 90.0 # terminate episode if motor exceeds ±90°
|
||||||
|
|
||||||
@@ -16,4 +22,5 @@ hpo:
|
|||||||
motor_vel_penalty: {min: 0.001, max: 0.1}
|
motor_vel_penalty: {min: 0.001, max: 0.1}
|
||||||
motor_angle_penalty: {min: 0.01, max: 0.2}
|
motor_angle_penalty: {min: 0.01, max: 0.2}
|
||||||
action_penalty: {min: 0.01, max: 0.2}
|
action_penalty: {min: 0.01, max: 0.2}
|
||||||
|
pendulum_init_range_deg: {min: 30.0, max: 180.0}
|
||||||
max_steps: {values: [500, 1000, 2000]}
|
max_steps: {values: [500, 1000, 2000]}
|
||||||
@@ -3,3 +3,15 @@ device: auto # auto = cuda if available, else cpu
|
|||||||
dt: 0.002
|
dt: 0.002
|
||||||
substeps: 10
|
substeps: 10
|
||||||
history_length: 10 # RMA-style: 10-step window of (obs, action) pairs
|
history_length: 10 # RMA-style: 10-step window of (obs, action) pairs
|
||||||
|
|
||||||
|
rma_mode: "none" # "none" | "teacher" | "deploy"
|
||||||
|
|
||||||
|
# ── Domain randomization (sim-to-real) ──────────────────────────────
|
||||||
|
# NOTE: action-delay and sensor-noise are applied for MJX, but the
|
||||||
|
# per-env dynamics *scales* (friction/damping/torque) are NOT yet wired
|
||||||
|
# into the JIT step — use runner=mujoco for scale randomization, or keep
|
||||||
|
# this block to delay+noise only on MJX.
|
||||||
|
domain_rand:
|
||||||
|
qpos_noise_std: 0.01 # rad — encoder angle noise
|
||||||
|
qvel_noise_std: 0.5 # rad/s — velocity-estimate noise (measured)
|
||||||
|
action_delay_steps: [0, 2] # control-step latency (0–40 ms)
|
||||||
|
|||||||
@@ -2,13 +2,17 @@ num_envs: 64
|
|||||||
device: auto # auto = cuda if available, else cpu
|
device: auto # auto = cuda if available, else cpu
|
||||||
dt: 0.002
|
dt: 0.002
|
||||||
substeps: 10
|
substeps: 10
|
||||||
history_length: 10 # RMA-style: 10-step window of (obs, action) pairs
|
history_length: 10 # must match training.history_length (DR + embedding)
|
||||||
|
|
||||||
# ── Sim2real: domain randomization ───────────────────────────────
|
rma_mode: "none" # "none" | "teacher" | "deploy"
|
||||||
|
|
||||||
|
# ── Domain randomization (sim-to-real) ──────────────────────────────
|
||||||
|
# Noise/delay levels anchored to the real recordings (~50 Hz, ~0.5 rad/s
|
||||||
|
# velocity noise, ≤1-step latency). Set domain_rand: {} to disable.
|
||||||
domain_rand:
|
domain_rand:
|
||||||
mass_frac: 0.15 # ±15% body mass randomization
|
qpos_noise_std: 0.01 # rad — encoder angle noise
|
||||||
friction_frac: 0.3 # ±30% joint friction
|
qvel_noise_std: 0.5 # rad/s — velocity-estimate noise (measured)
|
||||||
damping_frac: 0.3 # ±30% joint damping
|
action_delay_steps: [0, 2] # control-step latency (0–40 ms)
|
||||||
armature_frac: 0.2 # ±20% reflected rotor inertia
|
friction_scale: [0.6, 1.6] # Coulomb-friction multiplier
|
||||||
gear_frac: 0.15 # ±15% actuator gear ratio
|
damping_scale: [0.6, 1.6] # viscous-damping multiplier
|
||||||
com_offset: 0.005 # ±5mm center-of-mass shift
|
torque_scale: [0.85, 1.15] # motor-constant / battery-voltage variation
|
||||||
|
|||||||
@@ -6,3 +6,12 @@ device: cpu
|
|||||||
dt: 0.002
|
dt: 0.002
|
||||||
substeps: 10
|
substeps: 10
|
||||||
history_length: 10
|
history_length: 10
|
||||||
|
|
||||||
|
rma_mode: "none" # "none" | "teacher" | "deploy"
|
||||||
|
|
||||||
|
# Clean by default (deterministic eval). Confirming-experiment example —
|
||||||
|
# re-eval an existing checkpoint in sim with a fixed 1-step action delay:
|
||||||
|
# mjpython scripts/eval.py env=rotary_cartpole runner=mujoco_single \
|
||||||
|
# checkpoint=runs/.../agent_XXXX.pt \
|
||||||
|
# '++runner.domain_rand.action_delay_steps=[1,1]'
|
||||||
|
domain_rand: {}
|
||||||
|
|||||||
@@ -9,3 +9,5 @@ baud: 115200
|
|||||||
dt: 0.02 # control loop period (50 Hz, matches training)
|
dt: 0.02 # control loop period (50 Hz, matches training)
|
||||||
no_data_timeout: 2.0 # seconds of silence before declaring disconnect
|
no_data_timeout: 2.0 # seconds of silence before declaring disconnect
|
||||||
history_length: 10 # must match training runner
|
history_length: 10 # must match training runner
|
||||||
|
|
||||||
|
rma_mode: "none" # "none" | "teacher" | "deploy"
|
||||||
|
|||||||
@@ -1,19 +1,19 @@
|
|||||||
hidden_sizes: [128, 128]
|
hidden_sizes: [256, 256]
|
||||||
total_timesteps: 5000000
|
total_timesteps: 5000000
|
||||||
rollout_steps: 1024
|
rollout_steps: 2048
|
||||||
learning_epochs: 4
|
learning_epochs: 10
|
||||||
mini_batches: 4
|
mini_batches: 8
|
||||||
discount_factor: 0.99
|
discount_factor: 0.99
|
||||||
gae_lambda: 0.95
|
gae_lambda: 0.95
|
||||||
learning_rate: 0.0003
|
learning_rate: 0.0003
|
||||||
clip_ratio: 0.2
|
clip_ratio: 0.2
|
||||||
value_loss_scale: 0.5
|
value_loss_scale: 0.5
|
||||||
entropy_loss_scale: 0.05
|
entropy_loss_scale: 0.01
|
||||||
log_interval: 1000
|
log_interval: 1000
|
||||||
checkpoint_interval: 50000
|
checkpoint_interval: 50000
|
||||||
|
|
||||||
initial_log_std: 0.5
|
initial_log_std: -0.5
|
||||||
min_log_std: -2.0
|
min_log_std: -4.0
|
||||||
max_log_std: 2.0
|
max_log_std: 2.0
|
||||||
|
|
||||||
record_video_every: 10000
|
record_video_every: 10000
|
||||||
@@ -22,6 +22,10 @@ record_video_every: 10000
|
|||||||
history_length: 10 # temporal window (must match runner)
|
history_length: 10 # temporal window (must match runner)
|
||||||
embedding_dim: 32 # history encoder output dimension
|
embedding_dim: 32 # history encoder output dimension
|
||||||
|
|
||||||
|
# RMA (Rapid Motor Adaptation)
|
||||||
|
rma_mode: "none" # "none" | "teacher" | "deploy"
|
||||||
|
latent_dim: 8 # env encoder / adaptation latent dimension
|
||||||
|
|
||||||
# ClearML remote execution (GPU worker)
|
# ClearML remote execution (GPU worker)
|
||||||
remote: false
|
remote: false
|
||||||
|
|
||||||
|
|||||||
@@ -74,14 +74,33 @@ def _infer_hidden_sizes(state_dict: dict[str, torch.Tensor]) -> tuple[int, ...]:
|
|||||||
return tuple(sizes)
|
return tuple(sizes)
|
||||||
|
|
||||||
|
|
||||||
|
def _infer_encoder_out_dim(state_dict: dict[str, torch.Tensor]) -> int | None:
|
||||||
|
"""Return the history/adaptation encoder output dim, if present.
|
||||||
|
|
||||||
|
Lets eval reconstruct an embedding policy without knowing the training
|
||||||
|
embedding_dim/latent_dim — read it straight from the saved weights.
|
||||||
|
"""
|
||||||
|
for key in ("history_encoder.fc.weight", "adaptation_module.fc.weight"):
|
||||||
|
if key in state_dict:
|
||||||
|
return state_dict[key].shape[0]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def load_policy(
|
def load_policy(
|
||||||
checkpoint_path: str,
|
checkpoint_path: str,
|
||||||
observation_space: spaces.Space,
|
observation_space: spaces.Space,
|
||||||
action_space: spaces.Space,
|
action_space: spaces.Space,
|
||||||
device: torch.device = torch.device("cpu"),
|
device: torch.device = torch.device("cpu"),
|
||||||
|
history_length: int = 0,
|
||||||
|
rma_mode: str = "none",
|
||||||
|
raw_obs_dim: int = 0,
|
||||||
) -> tuple[SharedMLP, RunningStandardScaler]:
|
) -> tuple[SharedMLP, RunningStandardScaler]:
|
||||||
"""Load a trained SharedMLP + observation normalizer from a checkpoint.
|
"""Load a trained SharedMLP + observation normalizer from a checkpoint.
|
||||||
|
|
||||||
|
For DR + history-embedding policies (history_length > 0) or RMA deploy
|
||||||
|
policies (rma_mode="deploy"), the history/adaptation encoder must be
|
||||||
|
reconstructed too — its output dim is read back from the saved weights.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
(model, state_preprocessor) ready for inference.
|
(model, state_preprocessor) ready for inference.
|
||||||
"""
|
"""
|
||||||
@@ -89,13 +108,20 @@ def load_policy(
|
|||||||
|
|
||||||
# Infer architecture from saved weights.
|
# Infer architecture from saved weights.
|
||||||
hidden_sizes = _infer_hidden_sizes(ckpt["policy"])
|
hidden_sizes = _infer_hidden_sizes(ckpt["policy"])
|
||||||
|
enc_out = _infer_encoder_out_dim(ckpt["policy"])
|
||||||
|
|
||||||
# Reconstruct model.
|
# Reconstruct model — pass through the encoder config so a DR+embedding
|
||||||
|
# checkpoint rebuilds the history encoder with matching dimensions.
|
||||||
model = SharedMLP(
|
model = SharedMLP(
|
||||||
observation_space=observation_space,
|
observation_space=observation_space,
|
||||||
action_space=action_space,
|
action_space=action_space,
|
||||||
device=device,
|
device=device,
|
||||||
hidden_sizes=hidden_sizes,
|
hidden_sizes=hidden_sizes,
|
||||||
|
history_length=history_length,
|
||||||
|
rma_mode=rma_mode,
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
embedding_dim=enc_out or 32, # legacy "none" + history
|
||||||
|
latent_dim=enc_out or 8, # RMA deploy adaptation module
|
||||||
)
|
)
|
||||||
model.load_state_dict(ckpt["policy"])
|
model.load_state_dict(ckpt["policy"])
|
||||||
model.eval()
|
model.eval()
|
||||||
@@ -194,7 +220,10 @@ def _eval_sim(cfg: DictConfig, env_name: str, checkpoint_path: str) -> None:
|
|||||||
|
|
||||||
device = runner.device
|
device = runner.device
|
||||||
model, preprocessor = load_policy(
|
model, preprocessor = load_policy(
|
||||||
checkpoint_path, runner.observation_space, runner.action_space, device
|
checkpoint_path, runner.observation_space, runner.action_space, device,
|
||||||
|
history_length=runner.config.history_length,
|
||||||
|
rma_mode=runner.config.rma_mode,
|
||||||
|
raw_obs_dim=runner.env.observation_space.shape[0],
|
||||||
)
|
)
|
||||||
|
|
||||||
mj_model = runner._model
|
mj_model = runner._model
|
||||||
@@ -280,7 +309,10 @@ def _eval_serial(cfg: DictConfig, env_name: str, checkpoint_path: str) -> None:
|
|||||||
|
|
||||||
device = serial_runner.device
|
device = serial_runner.device
|
||||||
model, preprocessor = load_policy(
|
model, preprocessor = load_policy(
|
||||||
checkpoint_path, serial_runner.observation_space, serial_runner.action_space, device
|
checkpoint_path, serial_runner.observation_space, serial_runner.action_space, device,
|
||||||
|
history_length=serial_runner.config.history_length,
|
||||||
|
rma_mode=serial_runner.config.rma_mode,
|
||||||
|
raw_obs_dim=serial_runner.env.observation_space.shape[0],
|
||||||
)
|
)
|
||||||
|
|
||||||
# Set up digital-twin MuJoCo model for visualization.
|
# Set up digital-twin MuJoCo model for visualization.
|
||||||
|
|||||||
296
scripts/train_adaptation.py
Normal file
296
scripts/train_adaptation.py
Normal file
@@ -0,0 +1,296 @@
|
|||||||
|
"""RMA Phase 2: Train the adaptation module φ(history) → ẑ.
|
||||||
|
|
||||||
|
Loads a Phase 1 (teacher) checkpoint, freezes the backbone + env_encoder,
|
||||||
|
and trains a HistoryEncoder (adaptation module) to predict the teacher's
|
||||||
|
latent z from observation-action history using supervised MSE.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python scripts/train_adaptation.py \
|
||||||
|
--checkpoint runs/<run>/checkpoints/agent_XXXXX.pt \
|
||||||
|
--env rotary_cartpole \
|
||||||
|
--robot-path assets/rotary_cartpole \
|
||||||
|
--num-envs 64 \
|
||||||
|
--iterations 2000 \
|
||||||
|
--lr 3e-4
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import pathlib
|
||||||
|
import sys
|
||||||
|
|
||||||
|
_PROJECT_ROOT = str(pathlib.Path(__file__).resolve().parent.parent)
|
||||||
|
if _PROJECT_ROOT not in sys.path:
|
||||||
|
sys.path.insert(0, _PROJECT_ROOT)
|
||||||
|
|
||||||
|
import structlog
|
||||||
|
import torch
|
||||||
|
import tqdm
|
||||||
|
from gymnasium import spaces
|
||||||
|
from omegaconf import OmegaConf
|
||||||
|
|
||||||
|
from src.core.registry import build_env
|
||||||
|
from src.models.mlp import SharedMLP, EnvironmentEncoder, HistoryEncoder
|
||||||
|
from src.runners.mujoco import MuJoCoRunner, MuJoCoRunnerConfig
|
||||||
|
|
||||||
|
log = structlog.get_logger()
|
||||||
|
|
||||||
|
|
||||||
|
def _load_teacher_checkpoint(
|
||||||
|
path: str, obs_space: spaces.Space, act_space: spaces.Space,
|
||||||
|
device: torch.device, raw_obs_dim: int, mu_dim: int,
|
||||||
|
hidden_sizes: tuple[int, ...], latent_dim: int,
|
||||||
|
) -> SharedMLP:
|
||||||
|
"""Reconstruct the teacher SharedMLP and load saved weights."""
|
||||||
|
model = SharedMLP(
|
||||||
|
observation_space=obs_space,
|
||||||
|
action_space=act_space,
|
||||||
|
device=device,
|
||||||
|
hidden_sizes=hidden_sizes,
|
||||||
|
rma_mode="teacher",
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
mu_dim=mu_dim,
|
||||||
|
latent_dim=latent_dim,
|
||||||
|
)
|
||||||
|
ckpt = torch.load(path, map_location=device, weights_only=True)
|
||||||
|
# skrl saves under "policy" key with state_dict.
|
||||||
|
if "policy" in ckpt:
|
||||||
|
model.load_state_dict(ckpt["policy"])
|
||||||
|
else:
|
||||||
|
model.load_state_dict(ckpt)
|
||||||
|
return model
|
||||||
|
|
||||||
|
|
||||||
|
def _build_deploy_model(
|
||||||
|
teacher: SharedMLP,
|
||||||
|
obs_space: spaces.Space,
|
||||||
|
act_space: spaces.Space,
|
||||||
|
device: torch.device,
|
||||||
|
raw_obs_dim: int,
|
||||||
|
history_length: int,
|
||||||
|
hidden_sizes: tuple[int, ...],
|
||||||
|
latent_dim: int,
|
||||||
|
) -> SharedMLP:
|
||||||
|
"""Create a deploy-mode SharedMLP and copy backbone + heads from teacher."""
|
||||||
|
model = SharedMLP(
|
||||||
|
observation_space=obs_space,
|
||||||
|
action_space=act_space,
|
||||||
|
device=device,
|
||||||
|
hidden_sizes=hidden_sizes,
|
||||||
|
rma_mode="deploy",
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
history_length=history_length,
|
||||||
|
latent_dim=latent_dim,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Copy backbone, policy head, value head from teacher.
|
||||||
|
model.net.load_state_dict(teacher.net.state_dict())
|
||||||
|
model.mean_layer.load_state_dict(teacher.mean_layer.state_dict())
|
||||||
|
model.value_layer.load_state_dict(teacher.value_layer.state_dict())
|
||||||
|
model.log_std_parameter.data.copy_(teacher.log_std_parameter.data)
|
||||||
|
|
||||||
|
return model
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description="RMA Phase 2: train adaptation module")
|
||||||
|
parser.add_argument("--checkpoint", required=True, help="Path to Phase 1 teacher checkpoint")
|
||||||
|
parser.add_argument("--env", default="rotary_cartpole")
|
||||||
|
parser.add_argument("--robot-path", default="assets/rotary_cartpole")
|
||||||
|
parser.add_argument("--num-envs", type=int, default=64)
|
||||||
|
parser.add_argument("--iterations", type=int, default=2000)
|
||||||
|
parser.add_argument("--rollout-steps", type=int, default=256, help="Steps per rollout")
|
||||||
|
parser.add_argument("--lr", type=float, default=3e-4)
|
||||||
|
parser.add_argument("--latent-dim", type=int, default=8)
|
||||||
|
parser.add_argument("--hidden-sizes", type=int, nargs="+", default=[128, 128])
|
||||||
|
parser.add_argument("--history-length", type=int, default=10)
|
||||||
|
parser.add_argument("--output", default="checkpoints/adaptation.pt")
|
||||||
|
parser.add_argument("--device", default="cpu")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
device = torch.device(args.device)
|
||||||
|
hidden_sizes = tuple(args.hidden_sizes)
|
||||||
|
|
||||||
|
# ── Build env + runner (deploy mode with history + DR) ───────
|
||||||
|
env_cfg = OmegaConf.create({"env": {
|
||||||
|
"robot_path": args.robot_path,
|
||||||
|
}})
|
||||||
|
env = build_env(args.env, env_cfg)
|
||||||
|
|
||||||
|
runner_cfg = MuJoCoRunnerConfig(
|
||||||
|
num_envs=args.num_envs,
|
||||||
|
device=args.device,
|
||||||
|
history_length=args.history_length,
|
||||||
|
rma_mode="deploy",
|
||||||
|
domain_rand={
|
||||||
|
"qpos_noise_std": 0.01,
|
||||||
|
"qvel_noise_std": 0.5,
|
||||||
|
"action_delay_steps": [0, 2],
|
||||||
|
"friction_scale": [0.6, 1.6],
|
||||||
|
"damping_scale": [0.6, 1.6],
|
||||||
|
"torque_scale": [0.85, 1.15],
|
||||||
|
},
|
||||||
|
)
|
||||||
|
runner = MuJoCoRunner(env=env, config=runner_cfg)
|
||||||
|
|
||||||
|
raw_obs_dim = env.observation_space.shape[0]
|
||||||
|
act_dim = env.action_space.shape[0]
|
||||||
|
mu_dim = runner.privileged_dim
|
||||||
|
|
||||||
|
log.info(
|
||||||
|
"adaptation_setup",
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
act_dim=act_dim,
|
||||||
|
mu_dim=mu_dim,
|
||||||
|
latent_dim=args.latent_dim,
|
||||||
|
history_length=args.history_length,
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Load teacher & build deploy model ────────────────────────
|
||||||
|
# Teacher obs space: [raw_obs, μ]
|
||||||
|
teacher_obs_space = spaces.Box(
|
||||||
|
low=-torch.inf, high=torch.inf, shape=(raw_obs_dim + mu_dim,),
|
||||||
|
)
|
||||||
|
|
||||||
|
teacher = _load_teacher_checkpoint(
|
||||||
|
path=args.checkpoint,
|
||||||
|
obs_space=teacher_obs_space,
|
||||||
|
act_space=env.action_space,
|
||||||
|
device=device,
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
mu_dim=mu_dim,
|
||||||
|
hidden_sizes=hidden_sizes,
|
||||||
|
latent_dim=args.latent_dim,
|
||||||
|
)
|
||||||
|
teacher.eval()
|
||||||
|
for p in teacher.parameters():
|
||||||
|
p.requires_grad_(False)
|
||||||
|
|
||||||
|
# Deploy obs space: [raw_obs, history_flat]
|
||||||
|
step_dim = raw_obs_dim + act_dim
|
||||||
|
deploy_obs_space = spaces.Box(
|
||||||
|
low=-torch.inf, high=torch.inf,
|
||||||
|
shape=(raw_obs_dim + args.history_length * step_dim,),
|
||||||
|
)
|
||||||
|
|
||||||
|
deploy_model = _build_deploy_model(
|
||||||
|
teacher=teacher,
|
||||||
|
obs_space=deploy_obs_space,
|
||||||
|
act_space=env.action_space,
|
||||||
|
device=device,
|
||||||
|
raw_obs_dim=raw_obs_dim,
|
||||||
|
history_length=args.history_length,
|
||||||
|
hidden_sizes=hidden_sizes,
|
||||||
|
latent_dim=args.latent_dim,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Freeze everything except the adaptation module.
|
||||||
|
for name, param in deploy_model.named_parameters():
|
||||||
|
if "adaptation_module" not in name:
|
||||||
|
param.requires_grad_(False)
|
||||||
|
|
||||||
|
optimizer = torch.optim.Adam(
|
||||||
|
deploy_model.adaptation_module.parameters(), lr=args.lr,
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Training loop ────────────────────────────────────────────
|
||||||
|
log.info("starting_adaptation_training", iterations=args.iterations)
|
||||||
|
obs, _ = runner.reset()
|
||||||
|
|
||||||
|
for iteration in tqdm.tqdm(range(args.iterations), desc="Adaptation"):
|
||||||
|
# Collect a rollout using the deploy model.
|
||||||
|
z_targets: list[torch.Tensor] = []
|
||||||
|
z_preds: list[torch.Tensor] = []
|
||||||
|
|
||||||
|
for _step in range(args.rollout_steps):
|
||||||
|
with torch.no_grad():
|
||||||
|
# Get action from deploy model (uses adaptation module).
|
||||||
|
aug_obs = obs # already augmented by runner
|
||||||
|
actions = deploy_model.act(
|
||||||
|
{"states": aug_obs}, role="policy",
|
||||||
|
)[0]
|
||||||
|
|
||||||
|
obs, _, _, _, info = runner.step(actions)
|
||||||
|
|
||||||
|
# Compute teacher's z from privileged μ.
|
||||||
|
mu = info.get("privileged_obs")
|
||||||
|
if mu is not None:
|
||||||
|
z_target = teacher.env_encoder(mu)
|
||||||
|
z_targets.append(z_target)
|
||||||
|
|
||||||
|
# Compute adaptation module's ẑ from history.
|
||||||
|
raw = aug_obs[:, :raw_obs_dim]
|
||||||
|
hist_flat = aug_obs[:, raw_obs_dim:]
|
||||||
|
history = hist_flat.reshape(
|
||||||
|
-1, args.history_length, step_dim,
|
||||||
|
)
|
||||||
|
z_pred = deploy_model.adaptation_module(history)
|
||||||
|
z_preds.append(z_pred)
|
||||||
|
|
||||||
|
if not z_targets:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Supervised update on adaptation module.
|
||||||
|
z_target_batch = torch.cat(z_targets, dim=0).detach()
|
||||||
|
z_pred_batch = torch.cat(z_preds, dim=0)
|
||||||
|
|
||||||
|
# Re-compute z_pred with gradients (the ones above were no_grad).
|
||||||
|
# We need to re-encode from stored data; instead, collect with grad:
|
||||||
|
# Actually, z_preds were computed in no_grad. Let me re-collect
|
||||||
|
# a fresh batch with gradients.
|
||||||
|
obs_reset, _ = runner.reset()
|
||||||
|
obs = obs_reset
|
||||||
|
|
||||||
|
z_targets_grad: list[torch.Tensor] = []
|
||||||
|
z_preds_grad: list[torch.Tensor] = []
|
||||||
|
|
||||||
|
for _step in range(args.rollout_steps):
|
||||||
|
with torch.no_grad():
|
||||||
|
aug_obs = obs
|
||||||
|
actions = deploy_model.act(
|
||||||
|
{"states": aug_obs}, role="policy",
|
||||||
|
)[0]
|
||||||
|
obs, _, _, _, info = runner.step(actions)
|
||||||
|
mu = info.get("privileged_obs")
|
||||||
|
|
||||||
|
if mu is not None:
|
||||||
|
with torch.no_grad():
|
||||||
|
z_target = teacher.env_encoder(mu)
|
||||||
|
z_targets_grad.append(z_target)
|
||||||
|
|
||||||
|
# This time, compute z_pred WITH gradients.
|
||||||
|
raw = aug_obs[:, :raw_obs_dim]
|
||||||
|
hist_flat = aug_obs[:, raw_obs_dim:]
|
||||||
|
history = hist_flat.reshape(
|
||||||
|
-1, args.history_length, step_dim,
|
||||||
|
)
|
||||||
|
z_pred = deploy_model.adaptation_module(history)
|
||||||
|
z_preds_grad.append(z_pred)
|
||||||
|
|
||||||
|
if not z_targets_grad:
|
||||||
|
continue
|
||||||
|
|
||||||
|
z_target_all = torch.cat(z_targets_grad, dim=0).detach()
|
||||||
|
z_pred_all = torch.cat(z_preds_grad, dim=0)
|
||||||
|
|
||||||
|
loss = torch.nn.functional.mse_loss(z_pred_all, z_target_all)
|
||||||
|
|
||||||
|
optimizer.zero_grad()
|
||||||
|
loss.backward()
|
||||||
|
optimizer.step()
|
||||||
|
|
||||||
|
if iteration % 50 == 0:
|
||||||
|
log.info("adaptation_loss", iteration=iteration, mse=f"{loss.item():.6f}")
|
||||||
|
|
||||||
|
# ── Save adaptation weights ──────────────────────────────────
|
||||||
|
out_path = pathlib.Path(args.output)
|
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Save the full deploy model state dict.
|
||||||
|
torch.save(deploy_model.state_dict(), out_path)
|
||||||
|
log.info("adaptation_saved", path=str(out_path))
|
||||||
|
|
||||||
|
runner.close()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -46,12 +46,11 @@ class ActuatorConfig:
|
|||||||
deadzone: tuple[float, float] = (0.0, 0.0) # min |ctrl| for torque (pos, neg)
|
deadzone: tuple[float, float] = (0.0, 0.0) # min |ctrl| for torque (pos, neg)
|
||||||
damping: tuple[float, float] = (0.0, 0.0) # viscous damping (pos, neg)
|
damping: tuple[float, float] = (0.0, 0.0) # viscous damping (pos, neg)
|
||||||
frictionloss: tuple[float, float] = (0.0, 0.0) # Coulomb friction (pos, neg)
|
frictionloss: tuple[float, float] = (0.0, 0.0) # Coulomb friction (pos, neg)
|
||||||
stribeck_friction_boost: float = 0.0 # extra friction at low speed (fraction)
|
|
||||||
stribeck_vel: float = 2.0 # Stribeck velocity scale (rad/s)
|
|
||||||
action_bias: float = 0.0 # constant bias on action (H-bridge asymmetry)
|
|
||||||
kp: float = 0.0 # proportional gain (position / velocity actuators)
|
kp: float = 0.0 # proportional gain (position / velocity actuators)
|
||||||
kv: float = 0.0 # derivative gain (position actuators)
|
kv: float = 0.0 # derivative gain (position actuators)
|
||||||
filter_tau: float = 0.0 # 1st-order filter time constant (s); 0 = no filter
|
filter_tau: float = 0.0 # 1st-order filter time constant (s); 0 = no filter
|
||||||
|
viscous_quadratic: float = 0.0 # velocity² drag coefficient
|
||||||
|
back_emf_gain: float = 0.0 # back-EMF torque reduction
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def gear_avg(self) -> float:
|
def gear_avg(self) -> float:
|
||||||
@@ -65,15 +64,12 @@ class ActuatorConfig:
|
|||||||
or self.deadzone != (0.0, 0.0)
|
or self.deadzone != (0.0, 0.0)
|
||||||
or self.damping != (0.0, 0.0)
|
or self.damping != (0.0, 0.0)
|
||||||
or self.frictionloss != (0.0, 0.0)
|
or self.frictionloss != (0.0, 0.0)
|
||||||
or self.stribeck_friction_boost > 0
|
or self.viscous_quadratic > 0
|
||||||
or self.action_bias != 0.0
|
or self.back_emf_gain > 0
|
||||||
)
|
)
|
||||||
|
|
||||||
def transform_ctrl(self, ctrl: float) -> float:
|
def transform_ctrl(self, ctrl: float) -> float:
|
||||||
"""Apply bias, asymmetric deadzone, and gear compensation."""
|
"""Apply asymmetric deadzone and gear compensation to a scalar ctrl."""
|
||||||
# Action bias (H-bridge asymmetry)
|
|
||||||
ctrl = ctrl + self.action_bias
|
|
||||||
|
|
||||||
# Deadzone
|
# Deadzone
|
||||||
dz_pos, dz_neg = self.deadzone
|
dz_pos, dz_neg = self.deadzone
|
||||||
if ctrl >= 0 and ctrl < dz_pos:
|
if ctrl >= 0 and ctrl < dz_pos:
|
||||||
@@ -89,23 +85,35 @@ class ActuatorConfig:
|
|||||||
|
|
||||||
return ctrl
|
return ctrl
|
||||||
|
|
||||||
def compute_motor_force(self, vel: float, ctrl: float) -> float:
|
def compute_motor_force(self, vel: float, ctrl: float,
|
||||||
"""Asymmetric friction + Stribeck + damping → applied torque."""
|
friction_scale: float = 1.0,
|
||||||
|
damping_scale: float = 1.0) -> float:
|
||||||
|
"""Asymmetric friction, damping, drag, back-EMF → applied torque.
|
||||||
|
|
||||||
|
``friction_scale`` / ``damping_scale`` multiply the Coulomb-friction
|
||||||
|
and viscous-damping terms for per-env domain randomization
|
||||||
|
(1.0 = no randomization, the default used by sysid).
|
||||||
|
"""
|
||||||
torque = 0.0
|
torque = 0.0
|
||||||
|
|
||||||
# Coulomb friction (direction-dependent + Stribeck boost)
|
# Coulomb friction (direction-dependent)
|
||||||
fl_pos, fl_neg = self.frictionloss
|
fl_pos, fl_neg = self.frictionloss
|
||||||
if abs(vel) > 1e-6:
|
if abs(vel) > 1e-6:
|
||||||
fl = fl_pos if vel > 0 else fl_neg
|
fl = (fl_pos if vel > 0 else fl_neg) * friction_scale
|
||||||
if self.stribeck_friction_boost > 0 and self.stribeck_vel > 0:
|
|
||||||
fl = fl * (1.0 + self.stribeck_friction_boost
|
|
||||||
* math.exp(-abs(vel) / self.stribeck_vel))
|
|
||||||
torque -= math.copysign(fl, vel)
|
torque -= math.copysign(fl, vel)
|
||||||
|
|
||||||
# Viscous damping (direction-dependent)
|
# Viscous damping (direction-dependent)
|
||||||
damp = self.damping[0] if vel > 0 else self.damping[1]
|
damp = (self.damping[0] if vel > 0 else self.damping[1]) * damping_scale
|
||||||
torque -= damp * vel
|
torque -= damp * vel
|
||||||
|
|
||||||
|
# Quadratic velocity drag
|
||||||
|
if self.viscous_quadratic > 0:
|
||||||
|
torque -= self.viscous_quadratic * vel * abs(vel)
|
||||||
|
|
||||||
|
# Back-EMF torque reduction
|
||||||
|
if self.back_emf_gain > 0 and abs(ctrl) > 1e-6:
|
||||||
|
torque -= self.back_emf_gain * vel * math.copysign(1.0, ctrl)
|
||||||
|
|
||||||
return max(-10.0, min(10.0, torque))
|
return max(-10.0, min(10.0, torque))
|
||||||
|
|
||||||
def transform_action(self, action):
|
def transform_action(self, action):
|
||||||
|
|||||||
@@ -15,6 +15,18 @@ class BaseRunnerConfig:
|
|||||||
num_envs: int = 1
|
num_envs: int = 1
|
||||||
device: str = "cpu"
|
device: str = "cpu"
|
||||||
history_length: int = 0 # 0 = no history (single obs), >0 = RMA-style
|
history_length: int = 0 # 0 = no history (single obs), >0 = RMA-style
|
||||||
|
rma_mode: str = "none" # "none" | "teacher" | "deploy"
|
||||||
|
|
||||||
|
# ── Domain randomization (sim-to-real) ─────────────────────────
|
||||||
|
# Empty dict = disabled (every field below is a no-op). Supported keys:
|
||||||
|
# qpos_noise_std: float — Gaussian sensor noise on joint angles (rad)
|
||||||
|
# qvel_noise_std: float — Gaussian sensor noise on joint velocities (rad/s)
|
||||||
|
# action_delay_steps: [lo, hi] — per-env integer control-step latency
|
||||||
|
# friction_scale: [lo, hi] — per-env multiplier on Coulomb friction
|
||||||
|
# damping_scale: [lo, hi] — per-env multiplier on viscous damping
|
||||||
|
# torque_scale: [lo, hi] — per-env multiplier on applied motor torque
|
||||||
|
# The randomized factors are exposed as privileged_obs (μ) for RMA.
|
||||||
|
domain_rand: dict = dataclasses.field(default_factory=dict)
|
||||||
|
|
||||||
class BaseRunner(abc.ABC, Generic[T]):
|
class BaseRunner(abc.ABC, Generic[T]):
|
||||||
def __init__(self, env: BaseEnv, config: T) -> None:
|
def __init__(self, env: BaseEnv, config: T) -> None:
|
||||||
@@ -37,7 +49,15 @@ class BaseRunner(abc.ABC, Generic[T]):
|
|||||||
self.config.num_envs, dtype=torch.long, device=self.config.device
|
self.config.num_envs, dtype=torch.long, device=self.config.device
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── History buffer (RMA-style adaptation) ────────────────
|
# ── Domain randomization (latency / sensor noise / dynamics) ─
|
||||||
|
# Must precede the RMA block: teacher mode reads privileged_dim,
|
||||||
|
# which is derived from the randomized-factor count below.
|
||||||
|
self._setup_domain_rand()
|
||||||
|
|
||||||
|
# ── RMA mode ────────────────────────────────────────────
|
||||||
|
self._rma_mode: str = getattr(self.config, "rma_mode", "none")
|
||||||
|
|
||||||
|
# ── History buffer (used in "deploy" and legacy "none" modes) ─
|
||||||
self._history_len: int = getattr(self.config, "history_length", 0)
|
self._history_len: int = getattr(self.config, "history_length", 0)
|
||||||
if self._history_len > 0:
|
if self._history_len > 0:
|
||||||
obs_dim = self.observation_space.shape[0]
|
obs_dim = self.observation_space.shape[0]
|
||||||
@@ -50,11 +70,30 @@ class BaseRunner(abc.ABC, Generic[T]):
|
|||||||
self.config.num_envs, self._history_len, self._history_step_dim,
|
self.config.num_envs, self._history_len, self._history_step_dim,
|
||||||
device=self.config.device,
|
device=self.config.device,
|
||||||
)
|
)
|
||||||
# Augmented observation space: [current_obs, history_flat]
|
|
||||||
|
# ── Observation space augmentation ───────────────────────
|
||||||
from gymnasium import spaces
|
from gymnasium import spaces
|
||||||
aug_dim = obs_dim + self._history_len * self._history_step_dim
|
raw_obs_dim = self.observation_space.shape[0]
|
||||||
|
|
||||||
|
if self._rma_mode == "teacher":
|
||||||
|
# Teacher gets [raw_obs, μ]. μ dim resolved after _sim_initialize.
|
||||||
|
mu_dim = getattr(self, "privileged_dim", 0)
|
||||||
|
if mu_dim > 0:
|
||||||
|
aug_dim = raw_obs_dim + mu_dim
|
||||||
self.observation_space = spaces.Box(
|
self.observation_space = spaces.Box(
|
||||||
low=-torch.inf, high=torch.inf, shape=(aug_dim,)
|
low=-torch.inf, high=torch.inf, shape=(aug_dim,),
|
||||||
|
)
|
||||||
|
elif self._rma_mode == "deploy" and self._history_len > 0:
|
||||||
|
# Deploy gets [raw_obs, history_flat].
|
||||||
|
aug_dim = raw_obs_dim + self._history_len * self._history_step_dim
|
||||||
|
self.observation_space = spaces.Box(
|
||||||
|
low=-torch.inf, high=torch.inf, shape=(aug_dim,),
|
||||||
|
)
|
||||||
|
elif self._rma_mode == "none" and self._history_len > 0:
|
||||||
|
# Legacy mode: [raw_obs, history_flat].
|
||||||
|
aug_dim = raw_obs_dim + self._history_len * self._history_step_dim
|
||||||
|
self.observation_space = spaces.Box(
|
||||||
|
low=-torch.inf, high=torch.inf, shape=(aug_dim,),
|
||||||
)
|
)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
@@ -84,11 +123,139 @@ class BaseRunner(abc.ABC, Generic[T]):
|
|||||||
if hasattr(self, "_offscreen_renderer") and self._offscreen_renderer is not None:
|
if hasattr(self, "_offscreen_renderer") and self._offscreen_renderer is not None:
|
||||||
self._offscreen_renderer.close()
|
self._offscreen_renderer.close()
|
||||||
|
|
||||||
|
# ── Domain randomization ─────────────────────────────────────
|
||||||
|
|
||||||
|
_SCALE_FIELDS = ("friction_scale", "damping_scale", "torque_scale")
|
||||||
|
|
||||||
|
def _setup_domain_rand(self) -> None:
|
||||||
|
"""Parse the domain_rand config into per-env buffers + the μ layout.
|
||||||
|
|
||||||
|
All buffers are no-ops when ``domain_rand`` is empty: scales are 1.0,
|
||||||
|
delay is 0, noise std is 0, and privileged_dim is 0.
|
||||||
|
"""
|
||||||
|
dr = dict(getattr(self.config, "domain_rand", {}) or {})
|
||||||
|
n = self.config.num_envs
|
||||||
|
dev = self.config.device
|
||||||
|
|
||||||
|
# Fixed (not per-env) Gaussian sensor noise.
|
||||||
|
self._qpos_noise_std = float(dr.get("qpos_noise_std", 0.0))
|
||||||
|
self._qvel_noise_std = float(dr.get("qvel_noise_std", 0.0))
|
||||||
|
|
||||||
|
# Per-env multiplicative dynamics scales (applied by the sim runner).
|
||||||
|
self._dr_scales: dict[str, torch.Tensor] = {
|
||||||
|
f: torch.ones(n, device=dev) for f in self._SCALE_FIELDS
|
||||||
|
}
|
||||||
|
|
||||||
|
# Per-env integer action delay (in control steps).
|
||||||
|
self._dr_delay = torch.zeros(n, dtype=torch.long, device=dev)
|
||||||
|
|
||||||
|
# Spec list — its order fixes the privileged μ vector layout.
|
||||||
|
self._dr_specs: list[tuple[str, float, float]] = []
|
||||||
|
delay_range = dr.get("action_delay_steps")
|
||||||
|
if delay_range:
|
||||||
|
self._dr_specs.append(
|
||||||
|
("action_delay_steps", float(delay_range[0]), float(delay_range[1]))
|
||||||
|
)
|
||||||
|
self._max_delay = int(delay_range[1])
|
||||||
|
else:
|
||||||
|
self._max_delay = 0
|
||||||
|
for f in self._SCALE_FIELDS:
|
||||||
|
rng = dr.get(f)
|
||||||
|
if rng:
|
||||||
|
self._dr_specs.append((f, float(rng[0]), float(rng[1])))
|
||||||
|
|
||||||
|
self._mu_dim = len(self._dr_specs)
|
||||||
|
self._dr_mu = torch.zeros(n, self._mu_dim, device=dev)
|
||||||
|
|
||||||
|
# Action-delay ring buffer: (num_envs, max_delay + 1, act_dim).
|
||||||
|
if self._max_delay > 0:
|
||||||
|
act_dim = self.env.action_space.shape[0]
|
||||||
|
self._action_buf = torch.zeros(
|
||||||
|
n, self._max_delay + 1, act_dim, device=dev,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _resample_domain_rand(self, env_ids: torch.Tensor) -> None:
|
||||||
|
"""Sample fresh per-env DR factors (call on every (re)set)."""
|
||||||
|
if self._mu_dim == 0 or env_ids.numel() == 0:
|
||||||
|
return
|
||||||
|
dev = self.config.device
|
||||||
|
for j, (name, lo, hi) in enumerate(self._dr_specs):
|
||||||
|
if name == "action_delay_steps":
|
||||||
|
vals = torch.randint(
|
||||||
|
int(lo), int(hi) + 1, (env_ids.numel(),), device=dev,
|
||||||
|
)
|
||||||
|
self._dr_delay[env_ids] = vals
|
||||||
|
raw = vals.float()
|
||||||
|
else:
|
||||||
|
vals = torch.rand(env_ids.numel(), device=dev) * (hi - lo) + lo
|
||||||
|
self._dr_scales[name][env_ids] = vals
|
||||||
|
raw = vals
|
||||||
|
# Normalize each factor to ~[-1, 1] for the privileged μ vector.
|
||||||
|
span = (hi - lo) if (hi - lo) > 1e-9 else 1.0
|
||||||
|
self._dr_mu[env_ids, j] = 2.0 * (raw - lo) / span - 1.0
|
||||||
|
|
||||||
|
def _reset_action_buffer(self, env_ids: torch.Tensor) -> None:
|
||||||
|
if self._max_delay > 0:
|
||||||
|
self._action_buf[env_ids] = 0.0
|
||||||
|
|
||||||
|
def _apply_action_delay(self, actions: torch.Tensor) -> torch.Tensor:
|
||||||
|
"""Return the per-env delayed action that the simulator should apply.
|
||||||
|
|
||||||
|
The policy's commanded action is what gets stored in history; only
|
||||||
|
the action handed to ``_sim_step`` is delayed.
|
||||||
|
"""
|
||||||
|
if self._max_delay <= 0:
|
||||||
|
return actions
|
||||||
|
self._action_buf = torch.roll(self._action_buf, 1, dims=1)
|
||||||
|
self._action_buf[:, 0] = actions
|
||||||
|
idx = torch.arange(self.num_envs, device=self.device)
|
||||||
|
return self._action_buf[idx, self._dr_delay]
|
||||||
|
|
||||||
|
def _add_sensor_noise(
|
||||||
|
self, qpos: torch.Tensor, qvel: torch.Tensor,
|
||||||
|
) -> tuple[torch.Tensor, torch.Tensor]:
|
||||||
|
if self._qpos_noise_std > 0:
|
||||||
|
qpos = qpos + torch.randn_like(qpos) * self._qpos_noise_std
|
||||||
|
if self._qvel_noise_std > 0:
|
||||||
|
qvel = qvel + torch.randn_like(qvel) * self._qvel_noise_std
|
||||||
|
return qpos, qvel
|
||||||
|
|
||||||
|
def _compute_obs(self, qpos: torch.Tensor, qvel: torch.Tensor) -> torch.Tensor:
|
||||||
|
"""Observation the policy sees — built from noisy (sensor) state."""
|
||||||
|
nqpos, nqvel = self._add_sensor_noise(qpos, qvel)
|
||||||
|
return self.env.compute_observations(self.env.build_state(nqpos, nqvel))
|
||||||
|
|
||||||
|
# ── Privileged observation interface ─────────────────────────
|
||||||
|
|
||||||
|
@property
|
||||||
|
def privileged_dim(self) -> int:
|
||||||
|
"""Number of randomized DR factors exposed as μ (0 if DR disabled)."""
|
||||||
|
return getattr(self, "_mu_dim", 0)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def privileged_obs(self) -> torch.Tensor:
|
||||||
|
"""Per-env normalized DR factors μ ∈ [-1, 1] for RMA supervision."""
|
||||||
|
if getattr(self, "_mu_dim", 0) > 0:
|
||||||
|
return self._dr_mu
|
||||||
|
return torch.zeros(self.config.num_envs, 0, device=self.config.device)
|
||||||
|
|
||||||
|
# ── Observation augmentation ─────────────────────────────────
|
||||||
|
|
||||||
def _augment_obs(self, obs: torch.Tensor) -> torch.Tensor:
|
def _augment_obs(self, obs: torch.Tensor) -> torch.Tensor:
|
||||||
"""Concatenate history buffer to current obs if history is enabled."""
|
"""Augment raw obs based on RMA mode.
|
||||||
|
|
||||||
|
teacher: [raw_obs, μ]
|
||||||
|
deploy: [raw_obs, history_flat]
|
||||||
|
none: [raw_obs, history_flat] (legacy, or plain obs if no history)
|
||||||
|
"""
|
||||||
|
if self._rma_mode == "teacher":
|
||||||
|
mu = self.privileged_obs
|
||||||
|
if mu.shape[-1] > 0:
|
||||||
|
return torch.cat([obs, mu], dim=-1)
|
||||||
|
return obs
|
||||||
|
# deploy / none: concatenate history
|
||||||
if self._history_len <= 0:
|
if self._history_len <= 0:
|
||||||
return obs
|
return obs
|
||||||
# Flatten history: (num_envs, H, step_dim) → (num_envs, H * step_dim)
|
|
||||||
hist_flat = self._history_buf.reshape(obs.shape[0], -1)
|
hist_flat = self._history_buf.reshape(obs.shape[0], -1)
|
||||||
return torch.cat([obs, hist_flat], dim=-1)
|
return torch.cat([obs, hist_flat], dim=-1)
|
||||||
|
|
||||||
@@ -115,30 +282,41 @@ class BaseRunner(abc.ABC, Generic[T]):
|
|||||||
|
|
||||||
def reset(self) -> tuple[torch.Tensor, dict[str, Any]]:
|
def reset(self) -> tuple[torch.Tensor, dict[str, Any]]:
|
||||||
all_ids = torch.arange(self.num_envs, device=self.device)
|
all_ids = torch.arange(self.num_envs, device=self.device)
|
||||||
|
self._resample_domain_rand(all_ids)
|
||||||
|
self._reset_action_buffer(all_ids)
|
||||||
qpos, qvel = self._sim_reset(all_ids)
|
qpos, qvel = self._sim_reset(all_ids)
|
||||||
self.step_counts.zero_()
|
self.step_counts.zero_()
|
||||||
self._reset_history(all_ids)
|
self._reset_history(all_ids)
|
||||||
|
|
||||||
state = self.env.build_state(qpos, qvel)
|
obs = self._compute_obs(qpos, qvel)
|
||||||
obs = self.env.compute_observations(state)
|
|
||||||
return self._augment_obs(obs), {}
|
return self._augment_obs(obs), {}
|
||||||
|
|
||||||
def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, dict[str, Any]]:
|
def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, dict[str, Any]]:
|
||||||
self._last_actions = actions
|
self._last_actions = actions
|
||||||
qpos, qvel = self._sim_step(actions)
|
# Latency: the simulator applies a (per-env) delayed action.
|
||||||
|
sim_actions = self._apply_action_delay(actions)
|
||||||
|
qpos, qvel = self._sim_step(sim_actions)
|
||||||
self.step_counts += 1
|
self.step_counts += 1
|
||||||
|
|
||||||
state = self.env.build_state(qpos, qvel)
|
# Reward / termination use the TRUE state (no sensor noise) so the
|
||||||
obs = self.env.compute_observations(state)
|
# learning signal and safety checks stay clean.
|
||||||
rewards = self.env.compute_rewards(state, actions)
|
clean_state = self.env.build_state(qpos, qvel)
|
||||||
terminated = self.env.compute_terminations(state)
|
rewards = self.env.compute_rewards(clean_state, actions)
|
||||||
|
terminated = self.env.compute_terminations(clean_state)
|
||||||
truncated = self.env.compute_truncations(self.step_counts)
|
truncated = self.env.compute_truncations(self.step_counts)
|
||||||
|
|
||||||
|
# The observation the policy sees is built from the NOISY sensor state.
|
||||||
|
obs = self._compute_obs(qpos, qvel)
|
||||||
|
|
||||||
# Push current (obs, action) into history before augmenting.
|
# Push current (obs, action) into history before augmenting.
|
||||||
self._push_history(obs, actions)
|
self._push_history(obs, actions)
|
||||||
|
|
||||||
info: dict[str, Any] = {}
|
info: dict[str, Any] = {}
|
||||||
|
|
||||||
|
# Expose privileged DR params μ for RMA supervision (this step's dynamics).
|
||||||
|
if self.privileged_dim > 0:
|
||||||
|
info["privileged_obs"] = self.privileged_obs.clone()
|
||||||
|
|
||||||
done = terminated | truncated
|
done = terminated | truncated
|
||||||
done_ids = done.nonzero(as_tuple=False).squeeze(-1)
|
done_ids = done.nonzero(as_tuple=False).squeeze(-1)
|
||||||
|
|
||||||
@@ -146,12 +324,14 @@ class BaseRunner(abc.ABC, Generic[T]):
|
|||||||
info["final_observations"] = self._augment_obs(obs)[done_ids].clone()
|
info["final_observations"] = self._augment_obs(obs)[done_ids].clone()
|
||||||
info["final_env_ids"] = done_ids.clone()
|
info["final_env_ids"] = done_ids.clone()
|
||||||
|
|
||||||
|
# New episode → fresh dynamics + cleared latency buffer.
|
||||||
|
self._resample_domain_rand(done_ids)
|
||||||
|
self._reset_action_buffer(done_ids)
|
||||||
reset_qpos, reset_qvel = self._sim_reset(done_ids)
|
reset_qpos, reset_qvel = self._sim_reset(done_ids)
|
||||||
self.step_counts[done_ids] = 0
|
self.step_counts[done_ids] = 0
|
||||||
self._reset_history(done_ids)
|
self._reset_history(done_ids)
|
||||||
|
|
||||||
reset_state = self.env.build_state(reset_qpos, reset_qvel)
|
obs[done_ids] = self._compute_obs(reset_qpos, reset_qvel)
|
||||||
obs[done_ids] = self.env.compute_observations(reset_state)
|
|
||||||
|
|
||||||
# skrl expects (num_envs, 1) for rewards/terminated/truncated
|
# skrl expects (num_envs, 1) for rewards/terminated/truncated
|
||||||
return self._augment_obs(obs), rewards.unsqueeze(-1), terminated.unsqueeze(-1), truncated.unsqueeze(-1), info
|
return self._augment_obs(obs), rewards.unsqueeze(-1), terminated.unsqueeze(-1), truncated.unsqueeze(-1), info
|
||||||
|
|||||||
@@ -21,11 +21,17 @@ class RotaryCartPoleConfig(BaseEnvConfig):
|
|||||||
at the arm tip. Goal: swing the pendulum up and balance it upright.
|
at the arm tip. Goal: swing the pendulum up and balance it upright.
|
||||||
"""
|
"""
|
||||||
# Reward shaping
|
# Reward shaping
|
||||||
reward_upright_scale: float = 1.0 # cos(pendulum) when upright = +1
|
reward_upright_scale: float = 1.0 # upright reward ∈ [0, scale]
|
||||||
|
alive_bonus: float = 0.25 # per-step survival bonus (must stay alive > die)
|
||||||
|
balance_bonus: float = 2.0 # extra reward for upright AND still (beats spinning)
|
||||||
|
balance_vel_scale: float = 0.5 # decay rate of the bonus with pendulum speed
|
||||||
motor_vel_penalty: float = 0.01 # penalise high motor angular velocity
|
motor_vel_penalty: float = 0.01 # penalise high motor angular velocity
|
||||||
motor_angle_penalty: float = 0.05 # penalise deviation from centre
|
motor_angle_penalty: float = 0.05 # penalise deviation from centre
|
||||||
action_penalty: float = 0.05 # penalise large actions (energy cost)
|
action_penalty: float = 0.05 # penalise large actions (energy cost)
|
||||||
|
|
||||||
|
# ── Initial state randomisation ──
|
||||||
|
pendulum_init_range_deg: float = 180.0 # pendulum starts in [-range, +range]
|
||||||
|
|
||||||
# ── Software safety limit (env-level, on top of URDF + hardware) ──
|
# ── Software safety limit (env-level, on top of URDF + hardware) ──
|
||||||
motor_angle_limit_deg: float = 90.0 # terminate episode if exceeded
|
motor_angle_limit_deg: float = 90.0 # terminate episode if exceeded
|
||||||
|
|
||||||
@@ -82,8 +88,23 @@ class RotaryCartPoleEnv(BaseEnv[RotaryCartPoleConfig]):
|
|||||||
# ── Rewards ──────────────────────────────────────────────────
|
# ── Rewards ──────────────────────────────────────────────────
|
||||||
|
|
||||||
def compute_rewards(self, state: RotaryCartPoleState, actions: torch.Tensor) -> torch.Tensor:
|
def compute_rewards(self, state: RotaryCartPoleState, actions: torch.Tensor) -> torch.Tensor:
|
||||||
# Upright reward: -cos(θ) ∈ [-1, +1]
|
# Upright shaping ∈ [0, 1]: 0 hanging down (θ=0), 1 fully upright (θ=π).
|
||||||
reward = -torch.cos(state.pendulum_angle) * self.config.reward_upright_scale
|
# Non-negative by design so *surviving* always beats ending the episode early
|
||||||
|
# (otherwise the optimum is to slam the arm into the ±limit — "suicide policy").
|
||||||
|
upright = 0.5 * (1.0 - torch.cos(state.pendulum_angle))
|
||||||
|
|
||||||
|
# Balanced bonus: large ONLY when near the top AND nearly still. A freely
|
||||||
|
# spinning pendulum passes through the top at high speed, so stillness≈0 and
|
||||||
|
# it earns ~none of this — making true balancing strictly dominate the
|
||||||
|
# "just keep spinning in full loops" local optimum.
|
||||||
|
stillness = torch.exp(-self.config.balance_vel_scale * state.pendulum_vel.pow(2))
|
||||||
|
balance = self.config.balance_bonus * upright * stillness
|
||||||
|
|
||||||
|
# Per-step alive bonus keeps a not-yet-upright step net-positive despite
|
||||||
|
# penalties, so the −10 termination penalty is genuinely a deterrent.
|
||||||
|
reward = (upright * self.config.reward_upright_scale
|
||||||
|
+ balance
|
||||||
|
+ self.config.alive_bonus)
|
||||||
|
|
||||||
# Penalise fast motor spinning (discourages violent oscillation)
|
# Penalise fast motor spinning (discourages violent oscillation)
|
||||||
reward = reward - self.config.motor_vel_penalty * state.motor_vel.pow(2)
|
reward = reward - self.config.motor_vel_penalty * state.motor_vel.pow(2)
|
||||||
@@ -97,7 +118,7 @@ class RotaryCartPoleEnv(BaseEnv[RotaryCartPoleConfig]):
|
|||||||
# Penalty for exceeding motor angle limit (episode also terminates)
|
# Penalty for exceeding motor angle limit (episode also terminates)
|
||||||
limit_rad = math.radians(self.config.motor_angle_limit_deg)
|
limit_rad = math.radians(self.config.motor_angle_limit_deg)
|
||||||
exceeded = state.motor_angle.abs() >= limit_rad
|
exceeded = state.motor_angle.abs() >= limit_rad
|
||||||
reward = torch.where(exceeded, torch.tensor(-1000.0, device=reward.device), reward)
|
reward = torch.where(exceeded, torch.tensor(-10.0, device=reward.device), reward)
|
||||||
|
|
||||||
return reward
|
return reward
|
||||||
|
|
||||||
|
|||||||
@@ -11,9 +11,8 @@ class HistoryEncoder(nn.Module):
|
|||||||
Output: (batch, embedding_dim)
|
Output: (batch, embedding_dim)
|
||||||
|
|
||||||
Architecture: two temporal conv layers → global average pool → linear.
|
Architecture: two temporal conv layers → global average pool → linear.
|
||||||
This lets the policy implicitly infer environment parameters (mass,
|
Used as the adaptation module φ in RMA Phase 2 (deploy mode) to
|
||||||
friction, gear, etc.) from recent dynamics — the core of RMA-style
|
predict the latent ẑ from recent dynamics.
|
||||||
adaptation for sim2real.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
@@ -43,18 +42,95 @@ class HistoryEncoder(nn.Module):
|
|||||||
return self.fc(x)
|
return self.fc(x)
|
||||||
|
|
||||||
|
|
||||||
|
class EnvironmentEncoder(nn.Module):
|
||||||
|
"""MLP that compresses privileged DR parameters μ into a latent z.
|
||||||
|
|
||||||
|
Used in RMA Phase 1 (teacher mode): z = e(μ).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, mu_dim: int, latent_dim: int = 8, hidden: int = 64) -> None:
|
||||||
|
super().__init__()
|
||||||
|
self.net = nn.Sequential(
|
||||||
|
nn.Linear(mu_dim, hidden),
|
||||||
|
nn.ELU(),
|
||||||
|
nn.Linear(hidden, latent_dim),
|
||||||
|
)
|
||||||
|
|
||||||
|
def forward(self, mu: torch.Tensor) -> torch.Tensor:
|
||||||
|
"""mu: (batch, mu_dim) → z: (batch, latent_dim)."""
|
||||||
|
return self.net(mu)
|
||||||
|
|
||||||
|
|
||||||
class SharedMLP(GaussianMixin, DeterministicMixin, Model):
|
class SharedMLP(GaussianMixin, DeterministicMixin, Model):
|
||||||
def __init__(self, observation_space: spaces.Space, action_space: spaces.Space, device: torch.device, hidden_sizes: tuple[int, ...] = (32, 32), clip_actions: bool = False, clip_log_std: bool = True, min_log_std: float = -2.0, max_log_std: float = 2.0, initial_log_std: float = 0.0, history_length: int = 0, raw_obs_dim: int = 0, embedding_dim: int = 32):
|
"""Shared policy/value network with RMA support.
|
||||||
|
|
||||||
|
rma_mode:
|
||||||
|
"none" – legacy: optional history encoder, plain obs, backward compat.
|
||||||
|
"teacher" – Phase 1: env_encoder(μ) → z, backbone input = [raw_obs, z].
|
||||||
|
"deploy" – Phase 2+: adaptation_module(history) → ẑ, input = [raw_obs, ẑ].
|
||||||
|
|
||||||
|
Teacher and deploy modes produce the same backbone in_dim
|
||||||
|
(raw_obs_dim + latent_dim), so weights transfer cleanly.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
observation_space: spaces.Space,
|
||||||
|
action_space: spaces.Space,
|
||||||
|
device: torch.device,
|
||||||
|
hidden_sizes: tuple[int, ...] = (32, 32),
|
||||||
|
clip_actions: bool = False,
|
||||||
|
clip_log_std: bool = True,
|
||||||
|
min_log_std: float = -2.0,
|
||||||
|
max_log_std: float = 2.0,
|
||||||
|
initial_log_std: float = 0.0,
|
||||||
|
# ── Legacy (none mode) ───────────────────────────────────
|
||||||
|
history_length: int = 0,
|
||||||
|
raw_obs_dim: int = 0,
|
||||||
|
embedding_dim: int = 32,
|
||||||
|
# ── RMA modes ────────────────────────────────────────────
|
||||||
|
rma_mode: str = "none",
|
||||||
|
latent_dim: int = 8,
|
||||||
|
mu_dim: int = 0,
|
||||||
|
):
|
||||||
Model.__init__(self, observation_space, action_space, device)
|
Model.__init__(self, observation_space, action_space, device)
|
||||||
GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
|
GaussianMixin.__init__(
|
||||||
|
self, clip_actions, clip_log_std, min_log_std, max_log_std,
|
||||||
|
)
|
||||||
DeterministicMixin.__init__(self, clip_actions)
|
DeterministicMixin.__init__(self, clip_actions)
|
||||||
|
|
||||||
|
self._rma_mode = rma_mode
|
||||||
self._history_length = history_length
|
self._history_length = history_length
|
||||||
self._raw_obs_dim = raw_obs_dim
|
self._raw_obs_dim = raw_obs_dim
|
||||||
self._embedding_dim = embedding_dim
|
self._embedding_dim = embedding_dim
|
||||||
|
self._latent_dim = latent_dim
|
||||||
|
self._mu_dim = mu_dim
|
||||||
|
|
||||||
if history_length > 0 and raw_obs_dim > 0:
|
# ── Build encoder + determine backbone in_dim ────────────
|
||||||
# The observation is [current_obs(raw_obs_dim), history_flat(H * step_dim)].
|
self.env_encoder: EnvironmentEncoder | None = None
|
||||||
|
self.adaptation_module: HistoryEncoder | None = None
|
||||||
|
self.history_encoder: HistoryEncoder | None = None # legacy
|
||||||
|
|
||||||
|
if rma_mode == "teacher":
|
||||||
|
assert mu_dim > 0 and raw_obs_dim > 0
|
||||||
|
self.env_encoder = EnvironmentEncoder(
|
||||||
|
mu_dim=mu_dim, latent_dim=latent_dim,
|
||||||
|
)
|
||||||
|
in_dim = raw_obs_dim + latent_dim
|
||||||
|
|
||||||
|
elif rma_mode == "deploy":
|
||||||
|
assert history_length > 0 and raw_obs_dim > 0
|
||||||
|
act_dim = self.num_actions
|
||||||
|
step_dim = raw_obs_dim + act_dim
|
||||||
|
self.adaptation_module = HistoryEncoder(
|
||||||
|
history_length=history_length,
|
||||||
|
step_dim=step_dim,
|
||||||
|
embedding_dim=latent_dim,
|
||||||
|
)
|
||||||
|
in_dim = raw_obs_dim + latent_dim
|
||||||
|
|
||||||
|
elif history_length > 0 and raw_obs_dim > 0:
|
||||||
|
# Legacy "none" mode with history encoder.
|
||||||
act_dim = self.num_actions
|
act_dim = self.num_actions
|
||||||
step_dim = raw_obs_dim + act_dim
|
step_dim = raw_obs_dim + act_dim
|
||||||
self.history_encoder = HistoryEncoder(
|
self.history_encoder = HistoryEncoder(
|
||||||
@@ -62,12 +138,11 @@ class SharedMLP(GaussianMixin, DeterministicMixin, Model):
|
|||||||
step_dim=step_dim,
|
step_dim=step_dim,
|
||||||
embedding_dim=embedding_dim,
|
embedding_dim=embedding_dim,
|
||||||
)
|
)
|
||||||
# MLP input = raw obs + history embedding.
|
|
||||||
in_dim = raw_obs_dim + embedding_dim
|
in_dim = raw_obs_dim + embedding_dim
|
||||||
else:
|
else:
|
||||||
self.history_encoder = None
|
|
||||||
in_dim = self.num_observations
|
in_dim = self.num_observations
|
||||||
|
|
||||||
|
# ── Shared backbone ──────────────────────────────────────
|
||||||
layers: list[nn.Module] = []
|
layers: list[nn.Module] = []
|
||||||
for hidden_size in hidden_sizes:
|
for hidden_size in hidden_sizes:
|
||||||
layers.append(nn.Linear(in_dim, hidden_size))
|
layers.append(nn.Linear(in_dim, hidden_size))
|
||||||
@@ -77,24 +152,42 @@ class SharedMLP(GaussianMixin, DeterministicMixin, Model):
|
|||||||
|
|
||||||
# Policy head
|
# Policy head
|
||||||
self.mean_layer = nn.Linear(in_dim, self.num_actions)
|
self.mean_layer = nn.Linear(in_dim, self.num_actions)
|
||||||
self.log_std_parameter: nn.Parameter = nn.Parameter(torch.full((self.num_actions,), initial_log_std))
|
self.log_std_parameter: nn.Parameter = nn.Parameter(
|
||||||
|
torch.full((self.num_actions,), initial_log_std),
|
||||||
|
)
|
||||||
|
|
||||||
# Value head
|
# Value head
|
||||||
self.value_layer = nn.Linear(in_dim, 1)
|
self.value_layer = nn.Linear(in_dim, 1)
|
||||||
self._shared_output: torch.Tensor | None = None
|
self._shared_output: torch.Tensor | None = None
|
||||||
|
|
||||||
|
def act(
|
||||||
def act(self, inputs: dict[str, torch.Tensor], role: str = "") -> tuple[torch.Tensor, ...]:
|
self, inputs: dict[str, torch.Tensor], role: str = "",
|
||||||
|
) -> tuple[torch.Tensor, ...]:
|
||||||
if role == "policy":
|
if role == "policy":
|
||||||
return GaussianMixin.act(self, inputs, role)
|
return GaussianMixin.act(self, inputs, role)
|
||||||
elif role == "value":
|
elif role == "value":
|
||||||
return DeterministicMixin.act(self, inputs, role)
|
return DeterministicMixin.act(self, inputs, role)
|
||||||
|
|
||||||
def _encode(self, states: torch.Tensor) -> torch.Tensor:
|
def _encode(self, states: torch.Tensor) -> torch.Tensor:
|
||||||
"""Split augmented obs into current obs + history, encode, concat."""
|
"""Route through the correct encoder based on rma_mode."""
|
||||||
if self.history_encoder is None:
|
if self._rma_mode == "teacher":
|
||||||
return self.net(states)
|
# states = [raw_obs, μ]
|
||||||
|
obs = states[:, :self._raw_obs_dim]
|
||||||
|
mu = states[:, self._raw_obs_dim:]
|
||||||
|
z = self.env_encoder(mu)
|
||||||
|
return self.net(torch.cat([obs, z], dim=-1))
|
||||||
|
|
||||||
|
if self._rma_mode == "deploy":
|
||||||
|
# states = [raw_obs, history_flat]
|
||||||
|
obs = states[:, :self._raw_obs_dim]
|
||||||
|
hist_flat = states[:, self._raw_obs_dim:]
|
||||||
|
step_dim = self._raw_obs_dim + self.num_actions
|
||||||
|
history = hist_flat.reshape(-1, self._history_length, step_dim)
|
||||||
|
z_hat = self.adaptation_module(history)
|
||||||
|
return self.net(torch.cat([obs, z_hat], dim=-1))
|
||||||
|
|
||||||
|
# Legacy "none" mode.
|
||||||
|
if self.history_encoder is not None:
|
||||||
obs = states[:, :self._raw_obs_dim]
|
obs = states[:, :self._raw_obs_dim]
|
||||||
hist_flat = states[:, self._raw_obs_dim:]
|
hist_flat = states[:, self._raw_obs_dim:]
|
||||||
step_dim = self._raw_obs_dim + self.num_actions
|
step_dim = self._raw_obs_dim + self.num_actions
|
||||||
@@ -102,8 +195,10 @@ class SharedMLP(GaussianMixin, DeterministicMixin, Model):
|
|||||||
embedding = self.history_encoder(history)
|
embedding = self.history_encoder(history)
|
||||||
return self.net(torch.cat([obs, embedding], dim=-1))
|
return self.net(torch.cat([obs, embedding], dim=-1))
|
||||||
|
|
||||||
|
return self.net(states)
|
||||||
|
|
||||||
def compute(
|
def compute(
|
||||||
self, inputs: dict[str, torch.Tensor], role: str = ""
|
self, inputs: dict[str, torch.Tensor], role: str = "",
|
||||||
) -> tuple[torch.Tensor, ...]:
|
) -> tuple[torch.Tensor, ...]:
|
||||||
if role == "policy":
|
if role == "policy":
|
||||||
self._shared_output = self._encode(inputs["states"])
|
self._shared_output = self._encode(inputs["states"])
|
||||||
|
|||||||
@@ -155,9 +155,8 @@ class MJXRunner(BaseRunner[MJXRunnerConfig]):
|
|||||||
_fl_neg = jnp.array([a.frictionloss[1] for a in acts])
|
_fl_neg = jnp.array([a.frictionloss[1] for a in acts])
|
||||||
_damp_pos = jnp.array([a.damping[0] for a in acts])
|
_damp_pos = jnp.array([a.damping[0] for a in acts])
|
||||||
_damp_neg = jnp.array([a.damping[1] for a in acts])
|
_damp_neg = jnp.array([a.damping[1] for a in acts])
|
||||||
_stribeck_boost = jnp.array([a.stribeck_friction_boost for a in acts])
|
_visc_quad = jnp.array([a.viscous_quadratic for a in acts])
|
||||||
_stribeck_vel = jnp.array([a.stribeck_vel for a in acts])
|
_back_emf = jnp.array([a.back_emf_gain for a in acts])
|
||||||
_action_bias = jnp.array([a.action_bias for a in acts])
|
|
||||||
|
|
||||||
# ── Batched step (N substeps per call) ──────────────────────
|
# ── Batched step (N substeps per call) ──────────────────────
|
||||||
@jax.jit
|
@jax.jit
|
||||||
@@ -170,8 +169,8 @@ class MJXRunner(BaseRunner[MJXRunnerConfig]):
|
|||||||
ctrl = jnp.where(at_hi | at_lo, 0.0, ctrl)
|
ctrl = jnp.where(at_hi | at_lo, 0.0, ctrl)
|
||||||
|
|
||||||
if _has_motor:
|
if _has_motor:
|
||||||
# Action bias + Deadzone + asymmetric gear compensation
|
# Deadzone + asymmetric gear compensation
|
||||||
mc = ctrl[:, _ctrl_ids] + _action_bias
|
mc = ctrl[:, _ctrl_ids]
|
||||||
mc = jnp.where((mc >= 0) & (mc < _dz_pos), 0.0, mc)
|
mc = jnp.where((mc >= 0) & (mc < _dz_pos), 0.0, mc)
|
||||||
mc = jnp.where((mc < 0) & (mc > -_dz_neg), 0.0, mc)
|
mc = jnp.where((mc < 0) & (mc > -_dz_neg), 0.0, mc)
|
||||||
gear_dir = jnp.where(mc >= 0, _gear_pos, _gear_neg)
|
gear_dir = jnp.where(mc >= 0, _gear_pos, _gear_neg)
|
||||||
@@ -185,18 +184,21 @@ class MJXRunner(BaseRunner[MJXRunnerConfig]):
|
|||||||
vel = d.qvel[:, _qvel_ids]
|
vel = d.qvel[:, _qvel_ids]
|
||||||
mc = d.ctrl[:, _ctrl_ids]
|
mc = d.ctrl[:, _ctrl_ids]
|
||||||
|
|
||||||
# Coulomb friction (direction-dependent + Stribeck)
|
# Coulomb friction (direction-dependent)
|
||||||
fl = jnp.where(vel > 0, _fl_pos, _fl_neg)
|
fl = jnp.where(vel > 0, _fl_pos, _fl_neg)
|
||||||
stribeck_mult = 1.0 + _stribeck_boost * jnp.exp(
|
|
||||||
-jnp.abs(vel) / _stribeck_vel
|
|
||||||
)
|
|
||||||
fl = fl * stribeck_mult
|
|
||||||
torque = -jnp.where(
|
torque = -jnp.where(
|
||||||
jnp.abs(vel) > 1e-6, jnp.sign(vel) * fl, 0.0,
|
jnp.abs(vel) > 1e-6, jnp.sign(vel) * fl, 0.0,
|
||||||
)
|
)
|
||||||
# Viscous damping (direction-dependent)
|
# Viscous damping (direction-dependent)
|
||||||
damp = jnp.where(vel > 0, _damp_pos, _damp_neg)
|
damp = jnp.where(vel > 0, _damp_pos, _damp_neg)
|
||||||
torque = torque - damp * vel
|
torque = torque - damp * vel
|
||||||
|
# Quadratic velocity drag
|
||||||
|
torque = torque - _visc_quad * vel * jnp.abs(vel)
|
||||||
|
# Back-EMF torque reduction
|
||||||
|
bemf = _back_emf * vel * jnp.sign(mc)
|
||||||
|
torque = torque - jnp.where(
|
||||||
|
jnp.abs(mc) > 1e-6, bemf, 0.0,
|
||||||
|
)
|
||||||
torque = jnp.clip(torque, -10.0, 10.0)
|
torque = jnp.clip(torque, -10.0, 10.0)
|
||||||
d = d.replace(
|
d = d.replace(
|
||||||
qfrc_applied=d.qfrc_applied.at[:, _qvel_ids].set(torque),
|
qfrc_applied=d.qfrc_applied.at[:, _qvel_ids].set(torque),
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import dataclasses
|
import dataclasses
|
||||||
|
import math
|
||||||
import os
|
import os
|
||||||
import tempfile
|
import tempfile
|
||||||
import xml.etree.ElementTree as ET
|
import xml.etree.ElementTree as ET
|
||||||
@@ -13,34 +14,12 @@ from src.core.robot import RobotConfig
|
|||||||
from src.core.runner import BaseRunner, BaseRunnerConfig
|
from src.core.runner import BaseRunner, BaseRunnerConfig
|
||||||
|
|
||||||
|
|
||||||
@dataclasses.dataclass
|
|
||||||
class DomainRandConfig:
|
|
||||||
"""Per-reset randomization of MuJoCo model parameters.
|
|
||||||
|
|
||||||
Each field is a fractional range: the nominal value is multiplied by
|
|
||||||
``uniform(1 - frac, 1 + frac)``. Set to 0.0 to disable.
|
|
||||||
"""
|
|
||||||
mass_frac: float = 0.0 # body masses
|
|
||||||
friction_frac: float = 0.0 # joint frictionloss
|
|
||||||
damping_frac: float = 0.0 # joint damping
|
|
||||||
armature_frac: float = 0.0 # joint armature (reflected inertia)
|
|
||||||
gear_frac: float = 0.0 # actuator gear ratio
|
|
||||||
com_offset: float = 0.0 # center-of-mass shift (metres)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclasses.dataclass
|
@dataclasses.dataclass
|
||||||
class MuJoCoRunnerConfig(BaseRunnerConfig):
|
class MuJoCoRunnerConfig(BaseRunnerConfig):
|
||||||
num_envs: int = 16
|
num_envs: int = 16
|
||||||
device: str = "cpu"
|
device: str = "cpu"
|
||||||
dt: float = 0.002
|
dt: float = 0.002
|
||||||
substeps: int = 10
|
substeps: int = 10
|
||||||
# ── Sim2real ─────────────────────────────────────────────────
|
|
||||||
domain_rand: DomainRandConfig = dataclasses.field(default_factory=DomainRandConfig)
|
|
||||||
|
|
||||||
def __post_init__(self) -> None:
|
|
||||||
# Hydra passes domain_rand as a dict — convert to dataclass.
|
|
||||||
if isinstance(self.domain_rand, dict):
|
|
||||||
self.domain_rand = DomainRandConfig(**self.domain_rand)
|
|
||||||
|
|
||||||
|
|
||||||
class ActuatorLimits:
|
class ActuatorLimits:
|
||||||
@@ -225,6 +204,8 @@ def load_mujoco_model(robot: RobotConfig) -> mujoco.MjModel:
|
|||||||
|
|
||||||
|
|
||||||
class MuJoCoRunner(BaseRunner[MuJoCoRunnerConfig]):
|
class MuJoCoRunner(BaseRunner[MuJoCoRunnerConfig]):
|
||||||
|
def __init__(self, env: BaseEnv, config: MuJoCoRunnerConfig):
|
||||||
|
super().__init__(env, config)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def num_envs(self) -> int:
|
def num_envs(self) -> int:
|
||||||
@@ -253,15 +234,6 @@ class MuJoCoRunner(BaseRunner[MuJoCoRunnerConfig]):
|
|||||||
qvel_idx = self._model.jnt_dofadr[jnt_id]
|
qvel_idx = self._model.jnt_dofadr[jnt_id]
|
||||||
self._motor_actuators.append((act, qvel_idx))
|
self._motor_actuators.append((act, qvel_idx))
|
||||||
|
|
||||||
# ── Domain randomization: store nominal values ───────────
|
|
||||||
self._nominal_mass = self._model.body_mass.copy()
|
|
||||||
self._nominal_inertia = self._model.body_inertia.copy()
|
|
||||||
self._nominal_ipos = self._model.body_ipos.copy()
|
|
||||||
self._nominal_damping = self._model.dof_damping.copy()
|
|
||||||
self._nominal_armature = self._model.dof_armature.copy()
|
|
||||||
self._nominal_frictionloss = self._model.dof_frictionloss.copy()
|
|
||||||
self._nominal_gear = self._model.actuator_gear.copy()
|
|
||||||
|
|
||||||
def _sim_step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
|
def _sim_step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
|
||||||
actions_np: np.ndarray = actions.cpu().numpy()
|
actions_np: np.ndarray = actions.cpu().numpy()
|
||||||
|
|
||||||
@@ -275,14 +247,24 @@ class MuJoCoRunner(BaseRunner[MuJoCoRunnerConfig]):
|
|||||||
qpos_batch = np.zeros((self.num_envs, self._nq), dtype=np.float32)
|
qpos_batch = np.zeros((self.num_envs, self._nq), dtype=np.float32)
|
||||||
qvel_batch = np.zeros((self.num_envs, self._nv), dtype=np.float32)
|
qvel_batch = np.zeros((self.num_envs, self._nv), dtype=np.float32)
|
||||||
|
|
||||||
|
# Per-env domain-randomization scales (all 1.0 when DR is disabled).
|
||||||
|
fr_scale = self._dr_scales["friction_scale"].cpu().numpy()
|
||||||
|
dp_scale = self._dr_scales["damping_scale"].cpu().numpy()
|
||||||
|
tq_scale = self._dr_scales["torque_scale"].cpu().numpy()
|
||||||
|
|
||||||
for i, data in enumerate(self._data):
|
for i, data in enumerate(self._data):
|
||||||
data.ctrl[:] = actions_np[i]
|
# torque_scale emulates motor-constant / battery-voltage variation.
|
||||||
|
data.ctrl[:] = actions_np[i] * tq_scale[i]
|
||||||
for _ in range(self.config.substeps):
|
for _ in range(self.config.substeps):
|
||||||
# Apply asymmetric motor forces via qfrc_applied
|
# Apply asymmetric motor forces via qfrc_applied
|
||||||
for act, qvel_idx in self._motor_actuators:
|
for act, qvel_idx in self._motor_actuators:
|
||||||
vel = data.qvel[qvel_idx]
|
vel = data.qvel[qvel_idx]
|
||||||
ctrl = data.ctrl[0] # TODO: generalise for multi-actuator
|
ctrl = data.ctrl[0] # TODO: generalise for multi-actuator
|
||||||
data.qfrc_applied[qvel_idx] = act.compute_motor_force(vel, ctrl)
|
data.qfrc_applied[qvel_idx] = act.compute_motor_force(
|
||||||
|
vel, ctrl,
|
||||||
|
friction_scale=fr_scale[i],
|
||||||
|
damping_scale=dp_scale[i],
|
||||||
|
)
|
||||||
self._limits.enforce(self._model, data)
|
self._limits.enforce(self._model, data)
|
||||||
mujoco.mj_step(self._model, data)
|
mujoco.mj_step(self._model, data)
|
||||||
|
|
||||||
@@ -301,41 +283,22 @@ class MuJoCoRunner(BaseRunner[MuJoCoRunnerConfig]):
|
|||||||
qpos_batch = np.zeros((n, self._nq), dtype=np.float32)
|
qpos_batch = np.zeros((n, self._nq), dtype=np.float32)
|
||||||
qvel_batch = np.zeros((n, self._nv), dtype=np.float32)
|
qvel_batch = np.zeros((n, self._nv), dtype=np.float32)
|
||||||
|
|
||||||
# ── Domain randomization ─────────────────────────────────
|
# Check if env has a pendulum_init_range_deg for wide pendulum randomisation
|
||||||
dr = self.config.domain_rand
|
pend_range = getattr(self.env.config, "pendulum_init_range_deg", 0.0)
|
||||||
if dr.mass_frac > 0:
|
pend_range_rad = math.radians(pend_range) if pend_range > 0 else 0.0
|
||||||
scale = np.random.uniform(1 - dr.mass_frac, 1 + dr.mass_frac,
|
|
||||||
size=self._nominal_mass.shape)
|
|
||||||
self._model.body_mass[:] = self._nominal_mass * scale
|
|
||||||
self._model.body_inertia[:] = self._nominal_inertia * scale[:, None]
|
|
||||||
if dr.com_offset > 0:
|
|
||||||
offset = np.random.uniform(-dr.com_offset, dr.com_offset,
|
|
||||||
size=self._nominal_ipos.shape)
|
|
||||||
self._model.body_ipos[:] = self._nominal_ipos + offset
|
|
||||||
if dr.damping_frac > 0:
|
|
||||||
scale = np.random.uniform(1 - dr.damping_frac, 1 + dr.damping_frac,
|
|
||||||
size=self._nominal_damping.shape)
|
|
||||||
self._model.dof_damping[:] = np.maximum(0, self._nominal_damping * scale)
|
|
||||||
if dr.armature_frac > 0:
|
|
||||||
scale = np.random.uniform(1 - dr.armature_frac, 1 + dr.armature_frac,
|
|
||||||
size=self._nominal_armature.shape)
|
|
||||||
self._model.dof_armature[:] = np.maximum(0, self._nominal_armature * scale)
|
|
||||||
if dr.friction_frac > 0:
|
|
||||||
scale = np.random.uniform(1 - dr.friction_frac, 1 + dr.friction_frac,
|
|
||||||
size=self._nominal_frictionloss.shape)
|
|
||||||
self._model.dof_frictionloss[:] = np.maximum(0, self._nominal_frictionloss * scale)
|
|
||||||
if dr.gear_frac > 0:
|
|
||||||
scale = np.random.uniform(1 - dr.gear_frac, 1 + dr.gear_frac,
|
|
||||||
size=self._nominal_gear.shape)
|
|
||||||
self._model.actuator_gear[:] = self._nominal_gear * scale
|
|
||||||
|
|
||||||
for i, env_id in enumerate(ids):
|
for i, env_id in enumerate(ids):
|
||||||
data = self._data[env_id]
|
data = self._data[env_id]
|
||||||
mujoco.mj_resetData(self._model, data)
|
mujoco.mj_resetData(self._model, data)
|
||||||
|
|
||||||
# Small random perturbation for exploration
|
# Small random perturbation for motor angle + velocity
|
||||||
data.qpos[:] += np.random.uniform(-0.05, 0.05, size=self._nq)
|
data.qpos[:] += np.random.uniform(-0.05, 0.05, size=self._nq)
|
||||||
data.qvel[:] += np.random.uniform(-0.05, 0.05, size=self._nv)
|
data.qvel[:] += np.random.uniform(-0.05, 0.05, size=self._nv)
|
||||||
|
|
||||||
|
# Wide pendulum angle randomisation (overrides the small perturbation)
|
||||||
|
if pend_range_rad > 0 and self._nq >= 2:
|
||||||
|
data.qpos[1] = np.random.uniform(-pend_range_rad, pend_range_rad)
|
||||||
|
|
||||||
data.ctrl[:] = 0.0
|
data.ctrl[:] = 0.0
|
||||||
|
|
||||||
qpos_batch[i] = data.qpos
|
qpos_batch[i] = data.qpos
|
||||||
|
|||||||
186
src/sysid/motor/export.py
Normal file
186
src/sysid/motor/export.py
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
"""Export tuned motor parameters to MJCF and robot.yaml files.
|
||||||
|
|
||||||
|
Reads the original motor.xml and robot.yaml, patches with optimised
|
||||||
|
parameter values, and writes motor_tuned.xml + robot_tuned.yaml.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python -m src.sysid.motor.export \
|
||||||
|
--asset-path assets/motor \
|
||||||
|
--result assets/motor/motor_sysid_result.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import copy
|
||||||
|
import json
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import structlog
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
log = structlog.get_logger()
|
||||||
|
|
||||||
|
_DEFAULT_ASSET = "assets/motor"
|
||||||
|
|
||||||
|
|
||||||
|
def export_tuned_files(
|
||||||
|
asset_path: str | Path,
|
||||||
|
params: dict[str, float],
|
||||||
|
) -> tuple[Path, Path]:
|
||||||
|
"""Write tuned MJCF and robot.yaml files.
|
||||||
|
|
||||||
|
Returns (tuned_mjcf_path, tuned_robot_yaml_path).
|
||||||
|
"""
|
||||||
|
asset_path = Path(asset_path).resolve()
|
||||||
|
|
||||||
|
robot_yaml_path = asset_path / "robot.yaml"
|
||||||
|
robot_cfg = yaml.safe_load(robot_yaml_path.read_text())
|
||||||
|
mjcf_path = asset_path / robot_cfg["mjcf"]
|
||||||
|
|
||||||
|
# ── Tune MJCF ────────────────────────────────────────────────
|
||||||
|
tree = ET.parse(str(mjcf_path))
|
||||||
|
root = tree.getroot()
|
||||||
|
|
||||||
|
# Actuator — use average gear for the MJCF model.
|
||||||
|
gear_pos = params.get("actuator_gear_pos", params.get("actuator_gear"))
|
||||||
|
gear_neg = params.get("actuator_gear_neg", params.get("actuator_gear"))
|
||||||
|
gear_avg = None
|
||||||
|
if gear_pos is not None and gear_neg is not None:
|
||||||
|
gear_avg = (gear_pos + gear_neg) / 2.0
|
||||||
|
elif gear_pos is not None:
|
||||||
|
gear_avg = gear_pos
|
||||||
|
filter_tau = params.get("actuator_filter_tau")
|
||||||
|
for act_el in root.iter("general"):
|
||||||
|
if act_el.get("name") == "motor":
|
||||||
|
if gear_avg is not None:
|
||||||
|
act_el.set("gear", str(gear_avg))
|
||||||
|
if filter_tau is not None:
|
||||||
|
if filter_tau > 0:
|
||||||
|
act_el.set("dyntype", "filter")
|
||||||
|
act_el.set("dynprm", str(filter_tau))
|
||||||
|
else:
|
||||||
|
act_el.set("dyntype", "none")
|
||||||
|
|
||||||
|
# Joint — average damping & friction for MJCF (asymmetry in runtime).
|
||||||
|
fl_pos = params.get("motor_frictionloss_pos", params.get("motor_frictionloss"))
|
||||||
|
fl_neg = params.get("motor_frictionloss_neg", params.get("motor_frictionloss"))
|
||||||
|
fl_avg = None
|
||||||
|
if fl_pos is not None and fl_neg is not None:
|
||||||
|
fl_avg = (fl_pos + fl_neg) / 2.0
|
||||||
|
elif fl_pos is not None:
|
||||||
|
fl_avg = fl_pos
|
||||||
|
damp_pos = params.get("motor_damping_pos", params.get("motor_damping"))
|
||||||
|
damp_neg = params.get("motor_damping_neg", params.get("motor_damping"))
|
||||||
|
damp_avg = None
|
||||||
|
if damp_pos is not None and damp_neg is not None:
|
||||||
|
damp_avg = (damp_pos + damp_neg) / 2.0
|
||||||
|
elif damp_pos is not None:
|
||||||
|
damp_avg = damp_pos
|
||||||
|
for jnt in root.iter("joint"):
|
||||||
|
if jnt.get("name") == "motor_joint":
|
||||||
|
if damp_avg is not None:
|
||||||
|
jnt.set("damping", str(damp_avg))
|
||||||
|
if "motor_armature" in params:
|
||||||
|
jnt.set("armature", str(params["motor_armature"]))
|
||||||
|
if fl_avg is not None:
|
||||||
|
jnt.set("frictionloss", str(fl_avg))
|
||||||
|
|
||||||
|
# Rotor mass.
|
||||||
|
if "rotor_mass" in params:
|
||||||
|
for geom in root.iter("geom"):
|
||||||
|
if geom.get("name") == "rotor_disk":
|
||||||
|
geom.set("mass", str(params["rotor_mass"]))
|
||||||
|
|
||||||
|
# Write tuned MJCF.
|
||||||
|
tuned_mjcf_name = mjcf_path.stem + "_tuned" + mjcf_path.suffix
|
||||||
|
tuned_mjcf_path = asset_path / tuned_mjcf_name
|
||||||
|
ET.indent(tree, space=" ")
|
||||||
|
tree.write(str(tuned_mjcf_path), xml_declaration=True, encoding="unicode")
|
||||||
|
log.info("tuned_mjcf_written", path=str(tuned_mjcf_path))
|
||||||
|
|
||||||
|
# ── Tune robot.yaml ──────────────────────────────────────────
|
||||||
|
tuned_cfg = copy.deepcopy(robot_cfg)
|
||||||
|
tuned_cfg["mjcf"] = tuned_mjcf_name
|
||||||
|
|
||||||
|
if tuned_cfg.get("actuators") and len(tuned_cfg["actuators"]) > 0:
|
||||||
|
act = tuned_cfg["actuators"][0]
|
||||||
|
if gear_avg is not None:
|
||||||
|
act["gear"] = round(gear_avg, 6)
|
||||||
|
if "actuator_filter_tau" in params:
|
||||||
|
act["filter_tau"] = round(params["actuator_filter_tau"], 6)
|
||||||
|
if "motor_damping" in params:
|
||||||
|
act["damping"] = round(params["motor_damping"], 6)
|
||||||
|
|
||||||
|
if "joints" not in tuned_cfg:
|
||||||
|
tuned_cfg["joints"] = {}
|
||||||
|
if "motor_joint" not in tuned_cfg["joints"]:
|
||||||
|
tuned_cfg["joints"]["motor_joint"] = {}
|
||||||
|
mj = tuned_cfg["joints"]["motor_joint"]
|
||||||
|
if "motor_armature" in params:
|
||||||
|
mj["armature"] = round(params["motor_armature"], 6)
|
||||||
|
if fl_avg is not None:
|
||||||
|
mj["frictionloss"] = round(fl_avg, 6)
|
||||||
|
|
||||||
|
# Asymmetric / hardware-realism / nonlinear parameters.
|
||||||
|
realism = {}
|
||||||
|
for key in [
|
||||||
|
"actuator_gear_pos", "actuator_gear_neg",
|
||||||
|
"motor_damping_pos", "motor_damping_neg",
|
||||||
|
"motor_frictionloss_pos", "motor_frictionloss_neg",
|
||||||
|
"motor_deadzone_pos", "motor_deadzone_neg",
|
||||||
|
"action_bias",
|
||||||
|
"viscous_quadratic", "back_emf_gain",
|
||||||
|
"stribeck_friction_boost", "stribeck_vel",
|
||||||
|
"gearbox_backlash",
|
||||||
|
]:
|
||||||
|
if key in params:
|
||||||
|
realism[key] = round(params[key], 6)
|
||||||
|
if realism:
|
||||||
|
tuned_cfg["hardware_realism"] = realism
|
||||||
|
|
||||||
|
tuned_yaml_path = asset_path / "robot_tuned.yaml"
|
||||||
|
header = (
|
||||||
|
"# Tuned motor config — generated by src.sysid.motor.optimize\n"
|
||||||
|
"# Original: robot.yaml\n\n"
|
||||||
|
)
|
||||||
|
tuned_yaml_path.write_text(
|
||||||
|
header + yaml.dump(tuned_cfg, default_flow_style=False, sort_keys=False)
|
||||||
|
)
|
||||||
|
log.info("tuned_robot_yaml_written", path=str(tuned_yaml_path))
|
||||||
|
|
||||||
|
return tuned_mjcf_path, tuned_yaml_path
|
||||||
|
|
||||||
|
|
||||||
|
# ── CLI ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Export tuned motor parameters to MJCF + robot.yaml."
|
||||||
|
)
|
||||||
|
parser.add_argument("--asset-path", type=str, default=_DEFAULT_ASSET)
|
||||||
|
parser.add_argument(
|
||||||
|
"--result", type=str, default=None,
|
||||||
|
help="Path to motor_sysid_result.json (auto-detected if omitted)",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
asset_path = Path(args.asset_path).resolve()
|
||||||
|
if args.result:
|
||||||
|
result_path = Path(args.result)
|
||||||
|
else:
|
||||||
|
result_path = asset_path / "motor_sysid_result.json"
|
||||||
|
|
||||||
|
if not result_path.exists():
|
||||||
|
raise FileNotFoundError(f"Result file not found: {result_path}")
|
||||||
|
|
||||||
|
result = json.loads(result_path.read_text())
|
||||||
|
params = result["best_params"]
|
||||||
|
|
||||||
|
export_tuned_files(asset_path=args.asset_path, params=params)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -4,14 +4,21 @@ Minimises the trajectory-matching cost between a MuJoCo rollout and a
|
|||||||
recorded real-robot sequence. Uses the ``cmaes`` package (pure-Python
|
recorded real-robot sequence. Uses the ``cmaes`` package (pure-Python
|
||||||
CMA-ES with native box-constraint support).
|
CMA-ES with native box-constraint support).
|
||||||
|
|
||||||
All 28 parameters (motor + pendulum/arm) are optimised jointly from a
|
Motor parameters are **locked** from the motor-only sysid — only
|
||||||
single full-system recording. Velocities are optionally preprocessed
|
pendulum/arm inertial parameters, joint dynamics, and ctrl_limit are
|
||||||
with Savitzky-Golay differentiation for cleaner targets.
|
optimised. Velocities are optionally preprocessed with Savitzky-Golay
|
||||||
|
differentiation for cleaner targets.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python -m src.sysid.optimize \
|
python -m src.sysid.optimize \
|
||||||
--robot-path assets/rotary_cartpole \
|
--robot-path assets/rotary_cartpole \
|
||||||
--recording assets/rotary_cartpole/recordings/capture_....npz
|
--recording assets/rotary_cartpole/recordings/capture_20260314_000435.npz
|
||||||
|
|
||||||
|
# Shorter run for testing:
|
||||||
|
python -m src.sysid.optimize \
|
||||||
|
--robot-path assets/rotary_cartpole \
|
||||||
|
--recording <file>.npz \
|
||||||
|
--max-generations 10 --population-size 8
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
@@ -26,17 +33,13 @@ import numpy as np
|
|||||||
import structlog
|
import structlog
|
||||||
|
|
||||||
from src.sysid.rollout import (
|
from src.sysid.rollout import (
|
||||||
|
LOCKED_MOTOR_PARAMS,
|
||||||
PARAM_SETS,
|
PARAM_SETS,
|
||||||
ROTARY_CARTPOLE_PARAMS,
|
ROTARY_CARTPOLE_PARAMS,
|
||||||
ParamSpec,
|
ParamSpec,
|
||||||
_make_actuator,
|
|
||||||
_resolve_params,
|
|
||||||
bounds_arrays,
|
bounds_arrays,
|
||||||
build_base_model,
|
|
||||||
defaults_dict,
|
|
||||||
defaults_vector,
|
defaults_vector,
|
||||||
params_to_dict,
|
params_to_dict,
|
||||||
patch_model,
|
|
||||||
rollout,
|
rollout,
|
||||||
windowed_rollout,
|
windowed_rollout,
|
||||||
)
|
)
|
||||||
@@ -45,8 +48,62 @@ log = structlog.get_logger()
|
|||||||
|
|
||||||
|
|
||||||
# ── Velocity preprocessing ───────────────────────────────────────────
|
# ── Velocity preprocessing ───────────────────────────────────────────
|
||||||
# Shared implementation in src.sysid.preprocess.
|
|
||||||
from src.sysid.preprocess import preprocess_recording as _preprocess_recording
|
|
||||||
|
def _preprocess_recording(
|
||||||
|
recording: dict[str, np.ndarray],
|
||||||
|
preprocess_vel: bool = True,
|
||||||
|
sg_window: int = 7,
|
||||||
|
sg_polyorder: int = 3,
|
||||||
|
) -> dict[str, np.ndarray]:
|
||||||
|
"""Optionally recompute velocities using Savitzky-Golay differentiation.
|
||||||
|
|
||||||
|
Applies SG filtering to both motor_vel and pendulum_vel, replacing
|
||||||
|
the noisy firmware finite-difference velocities with smooth
|
||||||
|
analytical derivatives of the (clean) angle signals.
|
||||||
|
"""
|
||||||
|
if not preprocess_vel:
|
||||||
|
return recording
|
||||||
|
|
||||||
|
from scipy.signal import savgol_filter
|
||||||
|
|
||||||
|
rec = dict(recording)
|
||||||
|
times = rec["time"]
|
||||||
|
dt = float(np.mean(np.diff(times)))
|
||||||
|
|
||||||
|
# Motor velocity.
|
||||||
|
rec["motor_vel_raw"] = rec["motor_vel"].copy()
|
||||||
|
rec["motor_vel"] = savgol_filter(
|
||||||
|
rec["motor_angle"],
|
||||||
|
window_length=sg_window,
|
||||||
|
polyorder=sg_polyorder,
|
||||||
|
deriv=1,
|
||||||
|
delta=dt,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Pendulum velocity.
|
||||||
|
rec["pendulum_vel_raw"] = rec["pendulum_vel"].copy()
|
||||||
|
rec["pendulum_vel"] = savgol_filter(
|
||||||
|
rec["pendulum_angle"],
|
||||||
|
window_length=sg_window,
|
||||||
|
polyorder=sg_polyorder,
|
||||||
|
deriv=1,
|
||||||
|
delta=dt,
|
||||||
|
)
|
||||||
|
|
||||||
|
motor_noise = np.std(rec["motor_vel_raw"] - rec["motor_vel"])
|
||||||
|
pend_noise = np.std(rec["pendulum_vel_raw"] - rec["pendulum_vel"])
|
||||||
|
log.info(
|
||||||
|
"velocity_preprocessed",
|
||||||
|
method="savgol",
|
||||||
|
sg_window=sg_window,
|
||||||
|
sg_polyorder=sg_polyorder,
|
||||||
|
dt_ms=f"{dt*1000:.1f}",
|
||||||
|
motor_noise_std=f"{motor_noise:.3f} rad/s",
|
||||||
|
pend_noise_std=f"{pend_noise:.3f} rad/s",
|
||||||
|
)
|
||||||
|
|
||||||
|
return rec
|
||||||
|
|
||||||
|
|
||||||
# ── Cost function ────────────────────────────────────────────────────
|
# ── Cost function ────────────────────────────────────────────────────
|
||||||
@@ -115,6 +172,7 @@ def cost_function(
|
|||||||
vel_weight: float = 0.1,
|
vel_weight: float = 0.1,
|
||||||
pendulum_scale: float = 3.0,
|
pendulum_scale: float = 3.0,
|
||||||
window_duration: float = 0.5,
|
window_duration: float = 0.5,
|
||||||
|
motor_params: dict[str, float] | None = None,
|
||||||
) -> float:
|
) -> float:
|
||||||
"""Compute trajectory-matching cost for a candidate parameter vector.
|
"""Compute trajectory-matching cost for a candidate parameter vector.
|
||||||
|
|
||||||
@@ -140,6 +198,7 @@ def cost_function(
|
|||||||
window_duration=window_duration,
|
window_duration=window_duration,
|
||||||
sim_dt=sim_dt,
|
sim_dt=sim_dt,
|
||||||
substeps=substeps,
|
substeps=substeps,
|
||||||
|
motor_params=motor_params,
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
sim = rollout(
|
sim = rollout(
|
||||||
@@ -148,6 +207,7 @@ def cost_function(
|
|||||||
actions=recording["action"],
|
actions=recording["action"],
|
||||||
sim_dt=sim_dt,
|
sim_dt=sim_dt,
|
||||||
substeps=substeps,
|
substeps=substeps,
|
||||||
|
motor_params=motor_params,
|
||||||
)
|
)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
log.warning("rollout_failed", error=str(exc))
|
log.warning("rollout_failed", error=str(exc))
|
||||||
@@ -161,122 +221,6 @@ def cost_function(
|
|||||||
return _compute_trajectory_cost(sim, recording, pos_weight, vel_weight, pendulum_scale)
|
return _compute_trajectory_cost(sim, recording, pos_weight, vel_weight, pendulum_scale)
|
||||||
|
|
||||||
|
|
||||||
# ── Fast cost (model-cached) ────────────────────────────────────────
|
|
||||||
|
|
||||||
|
|
||||||
def _fast_cost(
|
|
||||||
params_vec: np.ndarray,
|
|
||||||
recording: dict[str, np.ndarray],
|
|
||||||
model, # mujoco.MjModel (cached, mutated in-place)
|
|
||||||
body_ids: dict[str, int],
|
|
||||||
dof_ids: dict[str, int],
|
|
||||||
specs: list[ParamSpec],
|
|
||||||
sim_dt: float = 0.002,
|
|
||||||
substeps: int = 10,
|
|
||||||
pos_weight: float = 1.0,
|
|
||||||
vel_weight: float = 0.1,
|
|
||||||
pendulum_scale: float = 3.0,
|
|
||||||
window_duration: float = 0.5,
|
|
||||||
) -> float:
|
|
||||||
"""Like cost_function but patches a cached model instead of rebuilding."""
|
|
||||||
import mujoco as _mj
|
|
||||||
from src.runners.mujoco import ActuatorLimits
|
|
||||||
|
|
||||||
params = params_to_dict(params_vec, specs)
|
|
||||||
if not _check_inertia_valid(params):
|
|
||||||
return 1e6
|
|
||||||
|
|
||||||
# Patch the cached model in-place — no XML, no temp files.
|
|
||||||
patch_model(model, params, body_ids, dof_ids)
|
|
||||||
|
|
||||||
# Rebuild actuator from params (motor params change per candidate).
|
|
||||||
actuator = _make_actuator(params)
|
|
||||||
|
|
||||||
try:
|
|
||||||
data = _mj.MjData(model)
|
|
||||||
limits = ActuatorLimits(model)
|
|
||||||
ctrl_limit = params.get("ctrl_limit", 0.588)
|
|
||||||
|
|
||||||
times = recording["time"]
|
|
||||||
actions = recording["action"]
|
|
||||||
n = len(actions)
|
|
||||||
|
|
||||||
sim_motor_angle = np.zeros(n, dtype=np.float64)
|
|
||||||
sim_motor_vel = np.zeros(n, dtype=np.float64)
|
|
||||||
sim_pend_angle = np.zeros(n, dtype=np.float64)
|
|
||||||
sim_pend_vel = np.zeros(n, dtype=np.float64)
|
|
||||||
|
|
||||||
if window_duration > 0:
|
|
||||||
# Windowed (multiple shooting).
|
|
||||||
real_motor = recording["motor_angle"]
|
|
||||||
real_motor_vel = recording["motor_vel"]
|
|
||||||
real_pend = recording["pendulum_angle"]
|
|
||||||
real_pend_vel = recording["pendulum_vel"]
|
|
||||||
|
|
||||||
t0, t_end = times[0], times[-1]
|
|
||||||
window_starts: list[int] = []
|
|
||||||
current_t = t0
|
|
||||||
while current_t < t_end:
|
|
||||||
idx = int(np.searchsorted(times, current_t))
|
|
||||||
idx = min(idx, n - 1)
|
|
||||||
window_starts.append(idx)
|
|
||||||
current_t += window_duration
|
|
||||||
n_windows = len(window_starts)
|
|
||||||
|
|
||||||
for w, w_start in enumerate(window_starts):
|
|
||||||
w_end = window_starts[w + 1] if w + 1 < n_windows else n
|
|
||||||
_mj.mj_resetData(model, data)
|
|
||||||
data.qpos[0] = real_motor[w_start]
|
|
||||||
data.qpos[1] = real_pend[w_start]
|
|
||||||
data.qvel[0] = real_motor_vel[w_start]
|
|
||||||
data.qvel[1] = real_pend_vel[w_start]
|
|
||||||
data.ctrl[:] = 0.0
|
|
||||||
_mj.mj_forward(model, data)
|
|
||||||
|
|
||||||
for i in range(w_start, w_end):
|
|
||||||
action = max(-ctrl_limit, min(ctrl_limit, float(actions[i])))
|
|
||||||
ctrl = actuator.transform_ctrl(action)
|
|
||||||
data.ctrl[0] = ctrl
|
|
||||||
for _ in range(substeps):
|
|
||||||
limits.enforce(model, data)
|
|
||||||
data.qfrc_applied[0] = actuator.compute_motor_force(data.qvel[0], ctrl)
|
|
||||||
_mj.mj_step(model, data)
|
|
||||||
sim_motor_angle[i] = data.qpos[0]
|
|
||||||
sim_pend_angle[i] = data.qpos[1]
|
|
||||||
sim_motor_vel[i] = data.qvel[0]
|
|
||||||
sim_pend_vel[i] = data.qvel[1]
|
|
||||||
else:
|
|
||||||
# Open-loop single-shot.
|
|
||||||
_mj.mj_resetData(model, data)
|
|
||||||
for i in range(n):
|
|
||||||
action = max(-ctrl_limit, min(ctrl_limit, float(actions[i])))
|
|
||||||
ctrl = actuator.transform_ctrl(action)
|
|
||||||
data.ctrl[0] = ctrl
|
|
||||||
for _ in range(substeps):
|
|
||||||
limits.enforce(model, data)
|
|
||||||
data.qfrc_applied[0] = actuator.compute_motor_force(data.qvel[0], ctrl)
|
|
||||||
_mj.mj_step(model, data)
|
|
||||||
sim_motor_angle[i] = data.qpos[0]
|
|
||||||
sim_pend_angle[i] = data.qpos[1]
|
|
||||||
sim_motor_vel[i] = data.qvel[0]
|
|
||||||
sim_pend_vel[i] = data.qvel[1]
|
|
||||||
|
|
||||||
sim = {
|
|
||||||
"motor_angle": sim_motor_angle,
|
|
||||||
"motor_vel": sim_motor_vel,
|
|
||||||
"pendulum_angle": sim_pend_angle,
|
|
||||||
"pendulum_vel": sim_pend_vel,
|
|
||||||
}
|
|
||||||
except Exception:
|
|
||||||
return 1e6
|
|
||||||
|
|
||||||
for key in sim:
|
|
||||||
if np.any(~np.isfinite(sim[key])):
|
|
||||||
return 1e6
|
|
||||||
|
|
||||||
return _compute_trajectory_cost(sim, recording, pos_weight, vel_weight, pendulum_scale)
|
|
||||||
|
|
||||||
|
|
||||||
# ── Parallel evaluation helper (module-level for pickling) ───────────
|
# ── Parallel evaluation helper (module-level for pickling) ───────────
|
||||||
|
|
||||||
# Shared state set by the parent process before spawning workers.
|
# Shared state set by the parent process before spawning workers.
|
||||||
@@ -285,34 +229,22 @@ _par_robot_path: Path = Path(".")
|
|||||||
_par_specs: list[ParamSpec] = []
|
_par_specs: list[ParamSpec] = []
|
||||||
_par_kwargs: dict = {}
|
_par_kwargs: dict = {}
|
||||||
|
|
||||||
# Cached model (built once per worker, patched per candidate).
|
|
||||||
_par_model: object = None # mujoco.MjModel
|
|
||||||
_par_body_ids: dict[str, int] = {}
|
|
||||||
_par_dof_ids: dict[str, int] = {}
|
|
||||||
|
|
||||||
|
|
||||||
def _init_worker(recording, robot_path, specs, kwargs):
|
def _init_worker(recording, robot_path, specs, kwargs):
|
||||||
"""Initialiser run once per worker process."""
|
"""Initialiser run once per worker process."""
|
||||||
global _par_recording, _par_robot_path, _par_specs, _par_kwargs
|
global _par_recording, _par_robot_path, _par_specs, _par_kwargs
|
||||||
global _par_model, _par_body_ids, _par_dof_ids
|
|
||||||
_par_recording = recording
|
_par_recording = recording
|
||||||
_par_robot_path = robot_path
|
_par_robot_path = robot_path
|
||||||
_par_specs = specs
|
_par_specs = specs
|
||||||
_par_kwargs = kwargs
|
_par_kwargs = kwargs
|
||||||
|
|
||||||
# Build base model once — reused for all candidates in this worker.
|
|
||||||
_par_model, _, _par_body_ids, _par_dof_ids = build_base_model(robot_path)
|
|
||||||
_par_model.opt.timestep = kwargs.get("sim_dt", 0.002)
|
|
||||||
|
|
||||||
|
|
||||||
def _eval_candidate(x_natural: np.ndarray) -> float:
|
def _eval_candidate(x_natural: np.ndarray) -> float:
|
||||||
"""Evaluate a single candidate — called in worker processes."""
|
"""Evaluate a single candidate — called in worker processes."""
|
||||||
return _fast_cost(
|
return cost_function(
|
||||||
x_natural,
|
x_natural,
|
||||||
_par_recording,
|
_par_recording,
|
||||||
_par_model,
|
_par_robot_path,
|
||||||
_par_body_ids,
|
|
||||||
_par_dof_ids,
|
|
||||||
_par_specs,
|
_par_specs,
|
||||||
**_par_kwargs,
|
**_par_kwargs,
|
||||||
)
|
)
|
||||||
@@ -336,11 +268,13 @@ def optimize(
|
|||||||
window_duration: float = 0.5,
|
window_duration: float = 0.5,
|
||||||
seed: int = 42,
|
seed: int = 42,
|
||||||
preprocess_vel: bool = True,
|
preprocess_vel: bool = True,
|
||||||
|
sg_window: int = 7,
|
||||||
|
sg_polyorder: int = 3,
|
||||||
) -> dict:
|
) -> dict:
|
||||||
"""Run CMA-ES optimisation and return results.
|
"""Run CMA-ES optimisation and return results.
|
||||||
|
|
||||||
All parameters (motor + pendulum/arm) are optimised jointly from a
|
Motor parameters are locked from the motor-only sysid.
|
||||||
single full-system recording.
|
Only pendulum/arm parameters are optimised.
|
||||||
|
|
||||||
Returns a dict with:
|
Returns a dict with:
|
||||||
best_params: dict[str, float]
|
best_params: dict[str, float]
|
||||||
@@ -348,6 +282,7 @@ def optimize(
|
|||||||
history: list of (generation, best_cost) tuples
|
history: list of (generation, best_cost) tuples
|
||||||
recording: str (path used)
|
recording: str (path used)
|
||||||
specs: list of param names
|
specs: list of param names
|
||||||
|
motor_params: dict of locked motor params
|
||||||
"""
|
"""
|
||||||
from cmaes import CMA
|
from cmaes import CMA
|
||||||
|
|
||||||
@@ -357,11 +292,23 @@ def optimize(
|
|||||||
if specs is None:
|
if specs is None:
|
||||||
specs = ROTARY_CARTPOLE_PARAMS
|
specs = ROTARY_CARTPOLE_PARAMS
|
||||||
|
|
||||||
|
motor_params = dict(LOCKED_MOTOR_PARAMS)
|
||||||
|
log.info(
|
||||||
|
"motor_params_locked",
|
||||||
|
n_params=len(motor_params),
|
||||||
|
gear_avg=f"{(motor_params['actuator_gear_pos'] + motor_params['actuator_gear_neg']) / 2:.4f}",
|
||||||
|
)
|
||||||
|
|
||||||
# Load recording.
|
# Load recording.
|
||||||
recording = dict(np.load(recording_path))
|
recording = dict(np.load(recording_path))
|
||||||
|
|
||||||
# Preprocessing: SG velocity recomputation.
|
# Preprocessing: SG velocity recomputation.
|
||||||
recording = _preprocess_recording(recording, preprocess_vel=preprocess_vel)
|
recording = _preprocess_recording(
|
||||||
|
recording,
|
||||||
|
preprocess_vel=preprocess_vel,
|
||||||
|
sg_window=sg_window,
|
||||||
|
sg_polyorder=sg_polyorder,
|
||||||
|
)
|
||||||
|
|
||||||
n_samples = len(recording["time"])
|
n_samples = len(recording["time"])
|
||||||
duration = recording["time"][-1] - recording["time"][0]
|
duration = recording["time"][-1] - recording["time"][0]
|
||||||
@@ -429,6 +376,7 @@ def optimize(
|
|||||||
vel_weight=vel_weight,
|
vel_weight=vel_weight,
|
||||||
pendulum_scale=pendulum_scale,
|
pendulum_scale=pendulum_scale,
|
||||||
window_duration=window_duration,
|
window_duration=window_duration,
|
||||||
|
motor_params=motor_params,
|
||||||
)
|
)
|
||||||
|
|
||||||
log.info("parallel_workers", n_workers=n_workers)
|
log.info("parallel_workers", n_workers=n_workers)
|
||||||
@@ -480,19 +428,6 @@ def optimize(
|
|||||||
gen_best=f"{min(c for _, c in solutions):.6f}",
|
gen_best=f"{min(c for _, c in solutions):.6f}",
|
||||||
)
|
)
|
||||||
|
|
||||||
# Early stopping: no improvement in last 50 generations.
|
|
||||||
patience = 50
|
|
||||||
if len(history) > patience:
|
|
||||||
old_cost = history[-patience][1]
|
|
||||||
if old_cost - best_cost < old_cost * 0.001:
|
|
||||||
log.info(
|
|
||||||
"early_stopping",
|
|
||||||
gen=gen,
|
|
||||||
best_cost=f"{best_cost:.6f}",
|
|
||||||
stall_gens=patience,
|
|
||||||
)
|
|
||||||
break
|
|
||||||
|
|
||||||
total_time = time.monotonic() - t0
|
total_time = time.monotonic() - t0
|
||||||
|
|
||||||
# Clean up process pool.
|
# Clean up process pool.
|
||||||
@@ -506,8 +441,7 @@ def optimize(
|
|||||||
"cmaes_finished",
|
"cmaes_finished",
|
||||||
best_cost=f"{best_cost:.6f}",
|
best_cost=f"{best_cost:.6f}",
|
||||||
total_time=f"{total_time:.1f}s",
|
total_time=f"{total_time:.1f}s",
|
||||||
generations=len(history),
|
evaluations=max_generations * population_size,
|
||||||
evaluations=len(history) * population_size,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
# Log parameter comparison.
|
# Log parameter comparison.
|
||||||
@@ -531,6 +465,7 @@ def optimize(
|
|||||||
"recording": str(recording_path),
|
"recording": str(recording_path),
|
||||||
"param_names": [s.name for s in specs],
|
"param_names": [s.name for s in specs],
|
||||||
"defaults": {s.name: s.default for s in specs},
|
"defaults": {s.name: s.default for s in specs},
|
||||||
|
"motor_params": motor_params,
|
||||||
"preprocess_vel": preprocess_vel,
|
"preprocess_vel": preprocess_vel,
|
||||||
"timestamp": datetime.now().isoformat(),
|
"timestamp": datetime.now().isoformat(),
|
||||||
}
|
}
|
||||||
@@ -581,12 +516,16 @@ def main() -> None:
|
|||||||
action="store_true",
|
action="store_true",
|
||||||
help="Skip Savitzky-Golay velocity preprocessing",
|
help="Skip Savitzky-Golay velocity preprocessing",
|
||||||
)
|
)
|
||||||
|
parser.add_argument("--sg-window", type=int, default=7,
|
||||||
|
help="Savitzky-Golay window length (odd, default 7)")
|
||||||
|
parser.add_argument("--sg-polyorder", type=int, default=3,
|
||||||
|
help="Savitzky-Golay polynomial order (default 3)")
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--param-set",
|
"--param-set",
|
||||||
type=str,
|
type=str,
|
||||||
default="full",
|
default="full",
|
||||||
choices=list(PARAM_SETS.keys()),
|
choices=list(PARAM_SETS.keys()),
|
||||||
help="Parameter set to optimize: 'full' (28), 'motor' (13), or 'pendulum' (15)",
|
help="Parameter set to optimize: 'reduced' (6 params, fast) or 'full' (15 params)",
|
||||||
)
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
@@ -607,6 +546,8 @@ def main() -> None:
|
|||||||
window_duration=args.window_duration,
|
window_duration=args.window_duration,
|
||||||
seed=args.seed,
|
seed=args.seed,
|
||||||
preprocess_vel=not args.no_preprocess_vel,
|
preprocess_vel=not args.no_preprocess_vel,
|
||||||
|
sg_window=args.sg_window,
|
||||||
|
sg_polyorder=args.sg_polyorder,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Save results JSON.
|
# Save results JSON.
|
||||||
@@ -631,6 +572,7 @@ def main() -> None:
|
|||||||
export_tuned_files(
|
export_tuned_files(
|
||||||
robot_path=args.robot_path,
|
robot_path=args.robot_path,
|
||||||
params=result["best_params"],
|
params=result["best_params"],
|
||||||
|
motor_params=result.get("motor_params"),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,158 +0,0 @@
|
|||||||
"""Recording preprocessing — clean velocity estimation from angle data.
|
|
||||||
|
|
||||||
The ESP32 firmware computes velocity as a raw finite-difference of encoder
|
|
||||||
counts at 50 Hz. With a 1320 CPR encoder that gives ~0.24 rad/s of
|
|
||||||
quantisation noise per count. This module replaces the noisy firmware
|
|
||||||
velocity with a smooth differentiation of the (much cleaner) angle signal.
|
|
||||||
|
|
||||||
Pipeline:
|
|
||||||
1. Dequantize angle signal (Gaussian smooth to round off staircase edges)
|
|
||||||
2. Savitzky-Golay differentiation to compute velocity (zero phase lag)
|
|
||||||
|
|
||||||
This is standard practice in robotics sysid — see e.g. MATLAB System ID
|
|
||||||
Toolbox, Drake's trajectory processing, or ETH's ANYmal sysid pipeline.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
from scipy.ndimage import gaussian_filter1d
|
|
||||||
from scipy.signal import savgol_filter
|
|
||||||
|
|
||||||
import structlog
|
|
||||||
|
|
||||||
log = structlog.get_logger()
|
|
||||||
|
|
||||||
# Default encoder CPR for angle dequantization.
|
|
||||||
_DEFAULT_CPR = 1320
|
|
||||||
|
|
||||||
|
|
||||||
def dequantize_angle(
|
|
||||||
angles: np.ndarray,
|
|
||||||
cpr: int = _DEFAULT_CPR,
|
|
||||||
sigma: float = 0.6,
|
|
||||||
) -> np.ndarray:
|
|
||||||
"""Smooth out encoder quantization staircase before differentiation.
|
|
||||||
|
|
||||||
A 1320 CPR encoder quantizes the angle signal into 0.00476 rad steps.
|
|
||||||
When differentiated, these steps produce velocity spikes (Dirac-like).
|
|
||||||
A tiny Gaussian blur (sigma=0.6 samples ~ 12ms at 50 Hz) rounds the
|
|
||||||
staircase edges without affecting dynamics (<2 Hz bandwidth loss).
|
|
||||||
"""
|
|
||||||
if sigma <= 0 or cpr <= 0:
|
|
||||||
return angles
|
|
||||||
return gaussian_filter1d(angles, sigma=sigma)
|
|
||||||
|
|
||||||
|
|
||||||
def recompute_velocity(
|
|
||||||
recording: dict[str, np.ndarray],
|
|
||||||
window_length: int = 11,
|
|
||||||
polyorder: int = 3,
|
|
||||||
deriv: int = 1,
|
|
||||||
dequantize: bool = True,
|
|
||||||
dequant_sigma: float = 0.6,
|
|
||||||
keys: tuple[str, str] = ("motor_angle", "motor_vel"),
|
|
||||||
) -> dict[str, np.ndarray]:
|
|
||||||
"""Recompute velocity from angle using Savitzky-Golay differentiation.
|
|
||||||
|
|
||||||
Parameters
|
|
||||||
----------
|
|
||||||
recording : dict with at least 'time' and the angle/vel keys.
|
|
||||||
window_length : SG filter window (must be odd, > polyorder).
|
|
||||||
polyorder : Polynomial order for the SG filter (3 = cubic).
|
|
||||||
deriv : Derivative order (1 = first derivative = velocity).
|
|
||||||
dequantize : Whether to smooth encoder quantization staircase.
|
|
||||||
dequant_sigma : Gaussian sigma for dequantization (samples).
|
|
||||||
keys : (angle_key, vel_key) to process.
|
|
||||||
|
|
||||||
Returns
|
|
||||||
-------
|
|
||||||
New recording dict with vel replaced and vel_raw added.
|
|
||||||
"""
|
|
||||||
angle_key, vel_key = keys
|
|
||||||
rec = dict(recording)
|
|
||||||
|
|
||||||
times = rec["time"]
|
|
||||||
angles = rec[angle_key].copy()
|
|
||||||
dt = np.mean(np.diff(times))
|
|
||||||
|
|
||||||
rec[f"{vel_key}_raw"] = rec[vel_key].copy()
|
|
||||||
|
|
||||||
if dequantize:
|
|
||||||
angles = dequantize_angle(angles, sigma=dequant_sigma)
|
|
||||||
|
|
||||||
vel_sg = savgol_filter(
|
|
||||||
angles,
|
|
||||||
window_length=window_length,
|
|
||||||
polyorder=polyorder,
|
|
||||||
deriv=deriv,
|
|
||||||
delta=dt,
|
|
||||||
)
|
|
||||||
|
|
||||||
raw_vel = rec[f"{vel_key}_raw"]
|
|
||||||
noise_estimate = np.std(raw_vel - vel_sg)
|
|
||||||
max_diff = np.max(np.abs(raw_vel - vel_sg))
|
|
||||||
|
|
||||||
log.info(
|
|
||||||
"velocity_recomputed",
|
|
||||||
channel=vel_key,
|
|
||||||
method="savgol",
|
|
||||||
window=window_length,
|
|
||||||
polyorder=polyorder,
|
|
||||||
dt=f"{dt*1000:.1f}ms",
|
|
||||||
noise_std=f"{noise_estimate:.3f} rad/s",
|
|
||||||
max_diff=f"{max_diff:.3f} rad/s",
|
|
||||||
)
|
|
||||||
|
|
||||||
rec[vel_key] = vel_sg
|
|
||||||
return rec
|
|
||||||
|
|
||||||
|
|
||||||
def preprocess_recording(
|
|
||||||
recording: dict[str, np.ndarray],
|
|
||||||
preprocess_vel: bool = True,
|
|
||||||
sg_window: int = 11,
|
|
||||||
sg_polyorder: int = 3,
|
|
||||||
) -> dict[str, np.ndarray]:
|
|
||||||
"""Preprocess a full-system recording (motor + pendulum velocities).
|
|
||||||
|
|
||||||
Applies SG differentiation to both motor and pendulum channels.
|
|
||||||
"""
|
|
||||||
if not preprocess_vel:
|
|
||||||
return recording
|
|
||||||
|
|
||||||
rec = recompute_velocity(
|
|
||||||
recording,
|
|
||||||
window_length=sg_window,
|
|
||||||
polyorder=sg_polyorder,
|
|
||||||
keys=("motor_angle", "motor_vel"),
|
|
||||||
)
|
|
||||||
if "pendulum_angle" in rec and "pendulum_vel" in rec:
|
|
||||||
rec = recompute_velocity(
|
|
||||||
rec,
|
|
||||||
window_length=sg_window,
|
|
||||||
polyorder=sg_polyorder,
|
|
||||||
keys=("pendulum_angle", "pendulum_vel"),
|
|
||||||
)
|
|
||||||
return rec
|
|
||||||
|
|
||||||
|
|
||||||
def match_filter_velocity(
|
|
||||||
sim_vel: np.ndarray,
|
|
||||||
real_vel: np.ndarray,
|
|
||||||
dt: float,
|
|
||||||
cutoff_hz: float = 8.0,
|
|
||||||
) -> tuple[np.ndarray, np.ndarray]:
|
|
||||||
"""Apply identical lowpass to both sim and real velocity for fair comparison.
|
|
||||||
|
|
||||||
MuJoCo sim velocity has full physics bandwidth while real velocity is
|
|
||||||
band-limited by the SG filter. Matching ensures fair cost comparison.
|
|
||||||
"""
|
|
||||||
from scipy.signal import butter, filtfilt
|
|
||||||
|
|
||||||
fs = 1.0 / dt
|
|
||||||
nyq = fs / 2.0
|
|
||||||
norm_cutoff = min(cutoff_hz / nyq, 0.99)
|
|
||||||
|
|
||||||
b, a = butter(2, norm_cutoff, btype="low")
|
|
||||||
return filtfilt(b, a, sim_vel), filtfilt(b, a, real_vel)
|
|
||||||
@@ -7,11 +7,12 @@ the simulated trajectory for comparison with the real recording.
|
|||||||
This module is the inner loop of the CMA-ES optimizer: it is called once
|
This module is the inner loop of the CMA-ES optimizer: it is called once
|
||||||
per candidate parameter vector per generation.
|
per candidate parameter vector per generation.
|
||||||
|
|
||||||
All motor + pendulum/arm parameters are optimised jointly from a single
|
Motor parameters are **locked** from the motor-only sysid result.
|
||||||
full-system recording. The asymmetric motor model (deadzone, gear
|
The optimizer only fits
|
||||||
compensation, Coulomb friction + Stribeck, viscous damping, action bias)
|
pendulum/arm inertial parameters, pendulum joint dynamics, and
|
||||||
is applied via ``ActuatorConfig.transform_ctrl()`` and
|
``ctrl_limit``. The asymmetric motor model (deadzone, gear compensation,
|
||||||
``compute_motor_force()``.
|
Coulomb friction, viscous damping, quadratic drag, back-EMF) is applied
|
||||||
|
via ``ActuatorConfig.transform_ctrl()`` and ``compute_motor_force()``.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
@@ -31,6 +32,26 @@ from src.runners.mujoco import ActuatorLimits, load_mujoco_model
|
|||||||
from src.sysid._urdf import patch_link_inertials
|
from src.sysid._urdf import patch_link_inertials
|
||||||
|
|
||||||
|
|
||||||
|
# ── Locked motor parameters (from motor-only sysid) ─────────────────
|
||||||
|
# These are FIXED and not optimised. They come from the 12-param model
|
||||||
|
# in robot.yaml (from motor-only sysid, cost 0.862).
|
||||||
|
|
||||||
|
LOCKED_MOTOR_PARAMS: dict[str, float] = {
|
||||||
|
"actuator_gear_pos": 0.424182,
|
||||||
|
"actuator_gear_neg": 0.425031,
|
||||||
|
"actuator_filter_tau": 0.00503506,
|
||||||
|
"motor_damping_pos": 0.00202682,
|
||||||
|
"motor_damping_neg": 0.0146651,
|
||||||
|
"motor_armature": 0.00277342,
|
||||||
|
"motor_frictionloss_pos": 0.0573282,
|
||||||
|
"motor_frictionloss_neg": 0.0533549,
|
||||||
|
"viscous_quadratic": 0.000285329,
|
||||||
|
"back_emf_gain": 0.00675809,
|
||||||
|
"motor_deadzone_pos": 0.141291,
|
||||||
|
"motor_deadzone_neg": 0.0780148,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# ── Tunable parameter specification ──────────────────────────────────
|
# ── Tunable parameter specification ──────────────────────────────────
|
||||||
|
|
||||||
|
|
||||||
@@ -45,31 +66,16 @@ class ParamSpec:
|
|||||||
log_scale: bool = False # optimise in log-space (masses, inertias)
|
log_scale: bool = False # optimise in log-space (masses, inertias)
|
||||||
|
|
||||||
|
|
||||||
# ── Motor parameters (13) ────────────────────────────────────────────
|
# Pendulum sysid parameters — motor params are LOCKED (not here).
|
||||||
# Defaults from motor-only sysid (cost 0.2117).
|
# Order matters: the optimizer maps a flat vector to these specs.
|
||||||
MOTOR_PARAMS: list[ParamSpec] = [
|
# Defaults are from the URDF exported by Fusion 360.
|
||||||
ParamSpec("actuator_gear_pos", 0.371194, 0.005, 1.5, log_scale=True),
|
ROTARY_CARTPOLE_PARAMS: list[ParamSpec] = [
|
||||||
ParamSpec("actuator_gear_neg", 0.428143, 0.005, 1.5, log_scale=True),
|
# ── Arm link (URDF) ──────────────────────────────────────────
|
||||||
ParamSpec("actuator_filter_tau", 0.022301, 0.001, 0.20),
|
|
||||||
ParamSpec("motor_damping_pos", 0.001384, 1e-5, 0.1, log_scale=True),
|
|
||||||
ParamSpec("motor_damping_neg", 0.005196, 1e-5, 0.1, log_scale=True),
|
|
||||||
ParamSpec("motor_armature", 0.002753, 1e-6, 0.01, log_scale=True),
|
|
||||||
ParamSpec("motor_frictionloss_pos", 0.036744, 0.001, 0.2, log_scale=True),
|
|
||||||
ParamSpec("motor_frictionloss_neg", 0.069082, 0.001, 0.2, log_scale=True),
|
|
||||||
ParamSpec("stribeck_friction_boost", 0.0, 0.0, 0.15),
|
|
||||||
ParamSpec("stribeck_vel", 2.0, 0.1, 8.0),
|
|
||||||
ParamSpec("motor_deadzone_pos", 0.14182, 0.0, 0.30),
|
|
||||||
ParamSpec("motor_deadzone_neg", 0.031454, 0.0, 0.30),
|
|
||||||
ParamSpec("action_bias", 0.0, -0.10, 0.10),
|
|
||||||
]
|
|
||||||
|
|
||||||
# ── Pendulum/arm parameters (15) ─────────────────────────────────────
|
|
||||||
# Defaults from Fusion 360 URDF export.
|
|
||||||
PENDULUM_PARAMS: list[ParamSpec] = [
|
|
||||||
ParamSpec("arm_mass", 0.02110, 0.005, 0.08, log_scale=True),
|
ParamSpec("arm_mass", 0.02110, 0.005, 0.08, log_scale=True),
|
||||||
ParamSpec("arm_com_x", -0.00710, -0.03, 0.03),
|
ParamSpec("arm_com_x", -0.00710, -0.03, 0.03),
|
||||||
ParamSpec("arm_com_y", 0.00085, -0.02, 0.02),
|
ParamSpec("arm_com_y", 0.00085, -0.02, 0.02),
|
||||||
ParamSpec("arm_com_z", 0.00795, -0.02, 0.02),
|
ParamSpec("arm_com_z", 0.00795, -0.02, 0.02),
|
||||||
|
# ── Pendulum link (URDF) ─────────────────────────────────────
|
||||||
ParamSpec("pendulum_mass", 0.03937, 0.010, 0.10, log_scale=True),
|
ParamSpec("pendulum_mass", 0.03937, 0.010, 0.10, log_scale=True),
|
||||||
ParamSpec("pendulum_com_x", 0.06025, 0.01, 0.15),
|
ParamSpec("pendulum_com_x", 0.06025, 0.01, 0.15),
|
||||||
ParamSpec("pendulum_com_y", -0.07602, -0.20, 0.0),
|
ParamSpec("pendulum_com_y", -0.07602, -0.20, 0.0),
|
||||||
@@ -78,18 +84,37 @@ PENDULUM_PARAMS: list[ParamSpec] = [
|
|||||||
ParamSpec("pendulum_iyy", 3.70e-05, 1e-07, 1e-03, log_scale=True),
|
ParamSpec("pendulum_iyy", 3.70e-05, 1e-07, 1e-03, log_scale=True),
|
||||||
ParamSpec("pendulum_izz", 7.83e-05, 1e-07, 1e-03, log_scale=True),
|
ParamSpec("pendulum_izz", 7.83e-05, 1e-07, 1e-03, log_scale=True),
|
||||||
ParamSpec("pendulum_ixy", -6.93e-06, -1e-03, 1e-03),
|
ParamSpec("pendulum_ixy", -6.93e-06, -1e-03, 1e-03),
|
||||||
|
# ── Pendulum joint dynamics ──────────────────────────────────
|
||||||
|
ParamSpec("pendulum_damping", 0.0001, 1e-6, 0.05, log_scale=True),
|
||||||
|
ParamSpec("pendulum_frictionloss", 0.0001, 1e-6, 0.05, log_scale=True),
|
||||||
|
# ── Hardware realism (control pipeline) ────────────────────
|
||||||
|
ParamSpec("ctrl_limit", 0.588, 0.45, 0.70), # MAX_MOTOR_SPEED / 255
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Extended set: full params + motor armature (compensates for the
|
||||||
|
# motor-only sysid having captured arm/pendulum loading in armature).
|
||||||
|
EXTENDED_PARAMS: list[ParamSpec] = ROTARY_CARTPOLE_PARAMS + [
|
||||||
|
ParamSpec("motor_armature", 0.00277, 0.0005, 0.02, log_scale=True),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Reduced set: only the 6 most impactful pendulum parameters.
|
||||||
|
# Good for a fast first pass — converges in ~50 generations.
|
||||||
|
REDUCED_PARAMS: list[ParamSpec] = [
|
||||||
|
ParamSpec("pendulum_mass", 0.03937, 0.010, 0.10, log_scale=True),
|
||||||
|
ParamSpec("pendulum_com_x", 0.06025, 0.01, 0.15),
|
||||||
|
ParamSpec("pendulum_com_y", -0.07602, -0.20, 0.0),
|
||||||
|
ParamSpec("pendulum_ixx", 6.20e-05, 1e-07, 1e-03, log_scale=True),
|
||||||
ParamSpec("pendulum_damping", 0.0001, 1e-6, 0.05, log_scale=True),
|
ParamSpec("pendulum_damping", 0.0001, 1e-6, 0.05, log_scale=True),
|
||||||
ParamSpec("pendulum_frictionloss", 0.0001, 1e-6, 0.05, log_scale=True),
|
ParamSpec("pendulum_frictionloss", 0.0001, 1e-6, 0.05, log_scale=True),
|
||||||
ParamSpec("ctrl_limit", 0.588, 0.45, 0.70),
|
|
||||||
]
|
]
|
||||||
|
|
||||||
# ── Combined parameter sets ──────────────────────────────────────────
|
|
||||||
ROTARY_CARTPOLE_PARAMS: list[ParamSpec] = MOTOR_PARAMS + PENDULUM_PARAMS
|
|
||||||
|
|
||||||
PARAM_SETS: dict[str, list[ParamSpec]] = {
|
PARAM_SETS: dict[str, list[ParamSpec]] = {
|
||||||
"full": ROTARY_CARTPOLE_PARAMS,
|
"full": ROTARY_CARTPOLE_PARAMS,
|
||||||
"motor": MOTOR_PARAMS,
|
"extended": EXTENDED_PARAMS,
|
||||||
"pendulum": PENDULUM_PARAMS,
|
"reduced": REDUCED_PARAMS,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -109,13 +134,6 @@ def defaults_vector(specs: list[ParamSpec] | None = None) -> np.ndarray:
|
|||||||
return np.array([s.default for s in specs], dtype=np.float64)
|
return np.array([s.default for s in specs], dtype=np.float64)
|
||||||
|
|
||||||
|
|
||||||
def defaults_dict(specs: list[ParamSpec] | None = None) -> dict[str, float]:
|
|
||||||
"""Return the default parameter dict (all 28 params)."""
|
|
||||||
if specs is None:
|
|
||||||
specs = ROTARY_CARTPOLE_PARAMS
|
|
||||||
return {s.name: s.default for s in specs}
|
|
||||||
|
|
||||||
|
|
||||||
def bounds_arrays(
|
def bounds_arrays(
|
||||||
specs: list[ParamSpec] | None = None,
|
specs: list[ParamSpec] | None = None,
|
||||||
) -> tuple[np.ndarray, np.ndarray]:
|
) -> tuple[np.ndarray, np.ndarray]:
|
||||||
@@ -130,23 +148,19 @@ def bounds_arrays(
|
|||||||
# ── MuJoCo model building with parameter overrides ──────────────────
|
# ── MuJoCo model building with parameter overrides ──────────────────
|
||||||
|
|
||||||
|
|
||||||
def _resolve_params(params: dict[str, float]) -> dict[str, float]:
|
|
||||||
"""Fill in defaults for any missing params."""
|
|
||||||
full = defaults_dict()
|
|
||||||
full.update(params)
|
|
||||||
return full
|
|
||||||
|
|
||||||
|
|
||||||
def _build_model(
|
def _build_model(
|
||||||
robot_path: Path,
|
robot_path: Path,
|
||||||
params: dict[str, float],
|
params: dict[str, float],
|
||||||
|
motor_params: dict[str, float] | None = None,
|
||||||
) -> tuple[mujoco.MjModel, ActuatorConfig]:
|
) -> tuple[mujoco.MjModel, ActuatorConfig]:
|
||||||
"""Build a MuJoCo model with sysid overrides.
|
"""Build a MuJoCo model with sysid overrides.
|
||||||
|
|
||||||
Returns (model, actuator) — use ``actuator.transform_ctrl()`` and
|
Returns (model, actuator) — use ``actuator.transform_ctrl()`` and
|
||||||
``actuator.compute_motor_force()`` in the rollout loop.
|
``actuator.compute_motor_force()`` in the rollout loop.
|
||||||
"""
|
"""
|
||||||
p = _resolve_params(params)
|
if motor_params is None:
|
||||||
|
motor_params = LOCKED_MOTOR_PARAMS
|
||||||
|
|
||||||
robot_path = Path(robot_path).resolve()
|
robot_path = Path(robot_path).resolve()
|
||||||
|
|
||||||
# ── Patch URDF inertial parameters to a temp file ────────────
|
# ── Patch URDF inertial parameters to a temp file ────────────
|
||||||
@@ -154,7 +168,7 @@ def _build_model(
|
|||||||
urdf_path = robot_path / robot_yaml["urdf"]
|
urdf_path = robot_path / robot_yaml["urdf"]
|
||||||
|
|
||||||
tree = ET.parse(urdf_path)
|
tree = ET.parse(urdf_path)
|
||||||
patch_link_inertials(tree.getroot(), p)
|
patch_link_inertials(tree.getroot(), params)
|
||||||
|
|
||||||
fd, tmp_urdf = tempfile.mkstemp(
|
fd, tmp_urdf = tempfile.mkstemp(
|
||||||
suffix=".urdf", prefix="_sysid_", dir=str(robot_path),
|
suffix=".urdf", prefix="_sysid_", dir=str(robot_path),
|
||||||
@@ -163,22 +177,39 @@ def _build_model(
|
|||||||
tmp_urdf_path = Path(tmp_urdf)
|
tmp_urdf_path = Path(tmp_urdf)
|
||||||
tree.write(str(tmp_urdf_path), xml_declaration=True, encoding="unicode")
|
tree.write(str(tmp_urdf_path), xml_declaration=True, encoding="unicode")
|
||||||
|
|
||||||
# ── Build RobotConfig with full motor model ──────────────────
|
# ── Build RobotConfig with full motor sysid values ───────────
|
||||||
|
gear_pos = motor_params.get("actuator_gear_pos", 0.424182)
|
||||||
|
gear_neg = motor_params.get("actuator_gear_neg", 0.425031)
|
||||||
|
motor_armature = params.get(
|
||||||
|
"motor_armature",
|
||||||
|
motor_params.get("motor_armature", 0.00277342),
|
||||||
|
)
|
||||||
|
pend_damping = params.get("pendulum_damping", 0.0001)
|
||||||
|
pend_frictionloss = params.get("pendulum_frictionloss", 0.0001)
|
||||||
|
|
||||||
act_cfg = robot_yaml["actuators"][0]
|
act_cfg = robot_yaml["actuators"][0]
|
||||||
|
ctrl_lo, ctrl_hi = act_cfg.get("ctrl_range", [-1.0, 1.0])
|
||||||
|
|
||||||
actuator = ActuatorConfig(
|
actuator = ActuatorConfig(
|
||||||
joint=act_cfg["joint"],
|
joint=act_cfg["joint"],
|
||||||
type="motor",
|
type="motor",
|
||||||
gear=(p["actuator_gear_pos"], p["actuator_gear_neg"]),
|
gear=(gear_pos, gear_neg),
|
||||||
ctrl_range=(act_cfg.get("ctrl_range", [-1.0, 1.0])[0],
|
ctrl_range=(ctrl_lo, ctrl_hi),
|
||||||
act_cfg.get("ctrl_range", [-1.0, 1.0])[1]),
|
deadzone=(
|
||||||
deadzone=(p["motor_deadzone_pos"], p["motor_deadzone_neg"]),
|
motor_params.get("motor_deadzone_pos", 0.141),
|
||||||
damping=(p["motor_damping_pos"], p["motor_damping_neg"]),
|
motor_params.get("motor_deadzone_neg", 0.078),
|
||||||
frictionloss=(p["motor_frictionloss_pos"], p["motor_frictionloss_neg"]),
|
),
|
||||||
stribeck_friction_boost=p["stribeck_friction_boost"],
|
damping=(
|
||||||
stribeck_vel=p["stribeck_vel"],
|
motor_params.get("motor_damping_pos", 0.002),
|
||||||
action_bias=p["action_bias"],
|
motor_params.get("motor_damping_neg", 0.015),
|
||||||
filter_tau=p["actuator_filter_tau"],
|
),
|
||||||
|
frictionloss=(
|
||||||
|
motor_params.get("motor_frictionloss_pos", 0.057),
|
||||||
|
motor_params.get("motor_frictionloss_neg", 0.053),
|
||||||
|
),
|
||||||
|
filter_tau=motor_params.get("actuator_filter_tau", 0.005),
|
||||||
|
viscous_quadratic=motor_params.get("viscous_quadratic", 0.000285),
|
||||||
|
back_emf_gain=motor_params.get("back_emf_gain", 0.00676),
|
||||||
)
|
)
|
||||||
|
|
||||||
robot = RobotConfig(
|
robot = RobotConfig(
|
||||||
@@ -187,12 +218,12 @@ def _build_model(
|
|||||||
joints={
|
joints={
|
||||||
"motor_joint": JointConfig(
|
"motor_joint": JointConfig(
|
||||||
damping=0.0,
|
damping=0.0,
|
||||||
armature=p["motor_armature"],
|
armature=motor_armature,
|
||||||
frictionloss=0.0,
|
frictionloss=0.0,
|
||||||
),
|
),
|
||||||
"pendulum_joint": JointConfig(
|
"pendulum_joint": JointConfig(
|
||||||
damping=p["pendulum_damping"],
|
damping=pend_damping,
|
||||||
frictionloss=p["pendulum_frictionloss"],
|
frictionloss=pend_frictionloss,
|
||||||
),
|
),
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
@@ -205,136 +236,7 @@ def _build_model(
|
|||||||
return model, actuator
|
return model, actuator
|
||||||
|
|
||||||
|
|
||||||
def build_base_model(
|
|
||||||
robot_path: str | Path,
|
|
||||||
params: dict[str, float] | None = None,
|
|
||||||
) -> tuple[mujoco.MjModel, ActuatorConfig, dict[str, int], dict[str, int]]:
|
|
||||||
"""Build a base MuJoCo model once (with default or given params).
|
|
||||||
|
|
||||||
Returns (model, actuator, body_ids, dof_ids) where body_ids/dof_ids
|
|
||||||
map names to indices for fast in-place patching.
|
|
||||||
"""
|
|
||||||
if params is None:
|
|
||||||
params = defaults_dict()
|
|
||||||
model, actuator = _build_model(Path(robot_path).resolve(), params)
|
|
||||||
|
|
||||||
body_ids = {}
|
|
||||||
for name in ("arm", "pendulum"):
|
|
||||||
body_ids[name] = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_BODY, name)
|
|
||||||
|
|
||||||
dof_ids = {}
|
|
||||||
for name in ("motor_joint", "pendulum_joint"):
|
|
||||||
jnt_id = mujoco.mj_name2id(model, mujoco.mjtObj.mjOBJ_JOINT, name)
|
|
||||||
dof_ids[name] = model.jnt_dofadr[jnt_id]
|
|
||||||
|
|
||||||
return model, actuator, body_ids, dof_ids
|
|
||||||
|
|
||||||
|
|
||||||
def _rotmat_to_quat(R: np.ndarray) -> np.ndarray:
|
|
||||||
"""Convert a 3×3 rotation matrix to a MuJoCo quaternion [w, x, y, z]."""
|
|
||||||
quat = np.empty(4)
|
|
||||||
mujoco.mju_mat2Quat(quat, R.flatten())
|
|
||||||
return quat
|
|
||||||
|
|
||||||
|
|
||||||
def _patch_pendulum_inertia(
|
|
||||||
model: mujoco.MjModel,
|
|
||||||
body_id: int,
|
|
||||||
params: dict[str, float],
|
|
||||||
) -> None:
|
|
||||||
"""Diagonalize pendulum inertia tensor and write to model."""
|
|
||||||
defs = defaults_dict()
|
|
||||||
|
|
||||||
ixx = params.get("pendulum_ixx", defs["pendulum_ixx"])
|
|
||||||
iyy = params.get("pendulum_iyy", defs["pendulum_iyy"])
|
|
||||||
izz = params.get("pendulum_izz", defs["pendulum_izz"])
|
|
||||||
ixy = params.get("pendulum_ixy", defs.get("pendulum_ixy", 0.0))
|
|
||||||
|
|
||||||
# Build full 3×3 inertia tensor (ixz = iyz = 0 in our URDF).
|
|
||||||
I = np.array([
|
|
||||||
[ixx, ixy, 0.0],
|
|
||||||
[ixy, iyy, 0.0],
|
|
||||||
[0.0, 0.0, izz],
|
|
||||||
])
|
|
||||||
|
|
||||||
# Eigendecompose — eigenvalues are principal moments, eigenvectors
|
|
||||||
# define the rotation from link frame to principal frame.
|
|
||||||
eigenvalues, eigenvectors = np.linalg.eigh(I)
|
|
||||||
|
|
||||||
# eigh returns ascending order; MuJoCo wants them in the order that
|
|
||||||
# the eigenvector matrix forms a proper rotation (det = +1).
|
|
||||||
if np.linalg.det(eigenvectors) < 0:
|
|
||||||
eigenvectors[:, 0] *= -1
|
|
||||||
|
|
||||||
# Clamp tiny/negative eigenvalues (numerical noise).
|
|
||||||
eigenvalues = np.maximum(eigenvalues, 1e-10)
|
|
||||||
|
|
||||||
# Enforce triangle inequality (required for valid rigid body inertia).
|
|
||||||
# For principal moments sorted ascending (a <= b <= c): a + b >= c.
|
|
||||||
# When violated, MuJoCo falls back to spherical inertia (trace/3).
|
|
||||||
# We must match this to keep fast path consistent with URDF loading.
|
|
||||||
s = np.sort(eigenvalues)
|
|
||||||
if s[0] + s[1] < s[2]:
|
|
||||||
sphere = np.sum(eigenvalues) / 3.0
|
|
||||||
eigenvalues[:] = sphere
|
|
||||||
eigenvectors = np.eye(3)
|
|
||||||
|
|
||||||
model.body_inertia[body_id] = eigenvalues
|
|
||||||
model.body_iquat[body_id] = _rotmat_to_quat(eigenvectors)
|
|
||||||
|
|
||||||
|
|
||||||
def patch_model(
|
|
||||||
model: mujoco.MjModel,
|
|
||||||
params: dict[str, float],
|
|
||||||
body_ids: dict[str, int],
|
|
||||||
dof_ids: dict[str, int],
|
|
||||||
) -> None:
|
|
||||||
"""Patch a cached MuJoCo model in-place with new sysid parameters.
|
|
||||||
|
|
||||||
Much faster than _build_model: no XML parsing, no temp files, no
|
|
||||||
model reload. Just direct array writes.
|
|
||||||
"""
|
|
||||||
p = _resolve_params(params)
|
|
||||||
|
|
||||||
# ── Arm body ─────────────────────────────────────────────────
|
|
||||||
arm_id = body_ids["arm"]
|
|
||||||
model.body_mass[arm_id] = p["arm_mass"]
|
|
||||||
for i, key in enumerate(("arm_com_x", "arm_com_y", "arm_com_z")):
|
|
||||||
model.body_ipos[arm_id, i] = p[key]
|
|
||||||
|
|
||||||
# ── Pendulum body ────────────────────────────────────────────
|
|
||||||
pend_id = body_ids["pendulum"]
|
|
||||||
model.body_mass[pend_id] = p["pendulum_mass"]
|
|
||||||
for i, key in enumerate(("pendulum_com_x", "pendulum_com_y", "pendulum_com_z")):
|
|
||||||
model.body_ipos[pend_id, i] = p[key]
|
|
||||||
|
|
||||||
_patch_pendulum_inertia(model, pend_id, p)
|
|
||||||
|
|
||||||
# ── Joint dynamics ───────────────────────────────────────────
|
|
||||||
motor_dof = dof_ids["motor_joint"]
|
|
||||||
model.dof_armature[motor_dof] = p["motor_armature"]
|
|
||||||
|
|
||||||
pend_dof = dof_ids["pendulum_joint"]
|
|
||||||
model.dof_damping[pend_dof] = p["pendulum_damping"]
|
|
||||||
model.dof_frictionloss[pend_dof] = p["pendulum_frictionloss"]
|
|
||||||
|
|
||||||
|
|
||||||
def _make_actuator(params: dict[str, float]) -> ActuatorConfig:
|
|
||||||
"""Build an ActuatorConfig from a params dict (for fast-path use)."""
|
|
||||||
p = _resolve_params(params)
|
|
||||||
return ActuatorConfig(
|
|
||||||
joint="motor_joint",
|
|
||||||
type="motor",
|
|
||||||
gear=(p["actuator_gear_pos"], p["actuator_gear_neg"]),
|
|
||||||
ctrl_range=(-1.0, 1.0),
|
|
||||||
deadzone=(p["motor_deadzone_pos"], p["motor_deadzone_neg"]),
|
|
||||||
damping=(p["motor_damping_pos"], p["motor_damping_neg"]),
|
|
||||||
frictionloss=(p["motor_frictionloss_pos"], p["motor_frictionloss_neg"]),
|
|
||||||
stribeck_friction_boost=p["stribeck_friction_boost"],
|
|
||||||
stribeck_vel=p["stribeck_vel"],
|
|
||||||
action_bias=p["action_bias"],
|
|
||||||
filter_tau=p["actuator_filter_tau"],
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
# ── Simulation rollout ───────────────────────────────────────────────
|
# ── Simulation rollout ───────────────────────────────────────────────
|
||||||
@@ -346,30 +248,35 @@ def rollout(
|
|||||||
actions: np.ndarray,
|
actions: np.ndarray,
|
||||||
sim_dt: float = 0.002,
|
sim_dt: float = 0.002,
|
||||||
substeps: int = 10,
|
substeps: int = 10,
|
||||||
|
motor_params: dict[str, float] | None = None,
|
||||||
) -> dict[str, np.ndarray]:
|
) -> dict[str, np.ndarray]:
|
||||||
"""Replay recorded actions in MuJoCo with overridden parameters.
|
"""Replay recorded actions in MuJoCo with overridden parameters.
|
||||||
|
|
||||||
Parameters
|
Parameters
|
||||||
----------
|
----------
|
||||||
robot_path : asset directory
|
robot_path : asset directory
|
||||||
params : named parameter overrides (all motor + pendulum params)
|
params : named parameter overrides (pendulum/arm only)
|
||||||
actions : (N,) normalised actions [-1, 1] from the recording
|
actions : (N,) normalised actions [-1, 1] from the recording
|
||||||
sim_dt : MuJoCo physics timestep
|
sim_dt : MuJoCo physics timestep
|
||||||
substeps : physics substeps per control step
|
substeps : physics substeps per control step
|
||||||
|
motor_params : locked motor params (default: LOCKED_MOTOR_PARAMS)
|
||||||
|
|
||||||
Returns
|
Returns
|
||||||
-------
|
-------
|
||||||
dict with keys: motor_angle, motor_vel, pendulum_angle, pendulum_vel
|
dict with keys: motor_angle, motor_vel, pendulum_angle, pendulum_vel
|
||||||
|
Each is an (N,) numpy array of simulated values.
|
||||||
"""
|
"""
|
||||||
|
if motor_params is None:
|
||||||
|
motor_params = LOCKED_MOTOR_PARAMS
|
||||||
|
|
||||||
robot_path = Path(robot_path).resolve()
|
robot_path = Path(robot_path).resolve()
|
||||||
model, actuator = _build_model(robot_path, params)
|
model, actuator = _build_model(robot_path, params, motor_params)
|
||||||
model.opt.timestep = sim_dt
|
model.opt.timestep = sim_dt
|
||||||
data = mujoco.MjData(model)
|
data = mujoco.MjData(model)
|
||||||
mujoco.mj_resetData(model, data)
|
mujoco.mj_resetData(model, data)
|
||||||
|
|
||||||
n = len(actions)
|
n = len(actions)
|
||||||
p = _resolve_params(params)
|
ctrl_limit = params.get("ctrl_limit", 0.588)
|
||||||
ctrl_limit = p["ctrl_limit"]
|
|
||||||
|
|
||||||
sim_motor_angle = np.zeros(n, dtype=np.float64)
|
sim_motor_angle = np.zeros(n, dtype=np.float64)
|
||||||
sim_motor_vel = np.zeros(n, dtype=np.float64)
|
sim_motor_vel = np.zeros(n, dtype=np.float64)
|
||||||
@@ -408,27 +315,41 @@ def windowed_rollout(
|
|||||||
window_duration: float = 0.5,
|
window_duration: float = 0.5,
|
||||||
sim_dt: float = 0.002,
|
sim_dt: float = 0.002,
|
||||||
substeps: int = 10,
|
substeps: int = 10,
|
||||||
|
motor_params: dict[str, float] | None = None,
|
||||||
) -> dict[str, np.ndarray | float]:
|
) -> dict[str, np.ndarray | float]:
|
||||||
"""Multiple-shooting rollout — split recording into short windows.
|
"""Multiple-shooting rollout — split recording into short windows.
|
||||||
|
|
||||||
|
For each window:
|
||||||
|
1. Initialize MuJoCo state from the real qpos/qvel at the window start.
|
||||||
|
2. Replay the recorded actions within the window.
|
||||||
|
3. Record the simulated output.
|
||||||
|
|
||||||
|
Motor dynamics (asymmetric friction, damping, back-EMF, etc.) are
|
||||||
|
applied via qfrc_applied using the locked motor sysid parameters.
|
||||||
|
|
||||||
Parameters
|
Parameters
|
||||||
----------
|
----------
|
||||||
robot_path : asset directory
|
robot_path : asset directory
|
||||||
params : named parameter overrides (all motor + pendulum params)
|
params : named parameter overrides (pendulum/arm only)
|
||||||
recording : dict with keys time, action, motor_angle, motor_vel,
|
recording : dict with keys time, action, motor_angle, motor_vel,
|
||||||
pendulum_angle, pendulum_vel (all 1D arrays of length N)
|
pendulum_angle, pendulum_vel (all 1D arrays of length N)
|
||||||
window_duration : length of each shooting window in seconds
|
window_duration : length of each shooting window in seconds
|
||||||
sim_dt : MuJoCo physics timestep
|
sim_dt : MuJoCo physics timestep
|
||||||
substeps : physics substeps per control step
|
substeps : physics substeps per control step
|
||||||
|
motor_params : locked motor params (default: LOCKED_MOTOR_PARAMS)
|
||||||
|
|
||||||
Returns
|
Returns
|
||||||
-------
|
-------
|
||||||
dict with:
|
dict with:
|
||||||
motor_angle, motor_vel, pendulum_angle, pendulum_vel — (N,) arrays
|
motor_angle, motor_vel, pendulum_angle, pendulum_vel — (N,) arrays
|
||||||
|
(stitched from per-window simulations)
|
||||||
n_windows — number of windows used
|
n_windows — number of windows used
|
||||||
"""
|
"""
|
||||||
|
if motor_params is None:
|
||||||
|
motor_params = LOCKED_MOTOR_PARAMS
|
||||||
|
|
||||||
robot_path = Path(robot_path).resolve()
|
robot_path = Path(robot_path).resolve()
|
||||||
model, actuator = _build_model(robot_path, params)
|
model, actuator = _build_model(robot_path, params, motor_params)
|
||||||
model.opt.timestep = sim_dt
|
model.opt.timestep = sim_dt
|
||||||
data = mujoco.MjData(model)
|
data = mujoco.MjData(model)
|
||||||
|
|
||||||
@@ -457,8 +378,7 @@ def windowed_rollout(
|
|||||||
window_starts.append(idx)
|
window_starts.append(idx)
|
||||||
current_t += window_duration
|
current_t += window_duration
|
||||||
|
|
||||||
p = _resolve_params(params)
|
ctrl_limit = params.get("ctrl_limit", 0.588)
|
||||||
ctrl_limit = p["ctrl_limit"]
|
|
||||||
n_windows = len(window_starts)
|
n_windows = len(window_starts)
|
||||||
|
|
||||||
for w, w_start in enumerate(window_starts):
|
for w, w_start in enumerate(window_starts):
|
||||||
|
|||||||
@@ -52,6 +52,10 @@ class TrainerConfig:
|
|||||||
history_length: int = 0 # 0 = disabled, >0 = temporal window size
|
history_length: int = 0 # 0 = disabled, >0 = temporal window size
|
||||||
embedding_dim: int = 32 # history encoder output dimension
|
embedding_dim: int = 32 # history encoder output dimension
|
||||||
|
|
||||||
|
# RMA (Rapid Motor Adaptation)
|
||||||
|
rma_mode: str = "none" # "none" | "teacher" | "deploy"
|
||||||
|
latent_dim: int = 8 # env encoder / adaptation latent dimension
|
||||||
|
|
||||||
|
|
||||||
# ── Video-recording trainer ──────────────────────────────────────────
|
# ── Video-recording trainer ──────────────────────────────────────────
|
||||||
|
|
||||||
@@ -177,11 +181,12 @@ class Trainer:
|
|||||||
device=device,
|
device=device,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Determine raw obs dim (without history augmentation).
|
# Determine raw obs dim (without history/privileged augmentation).
|
||||||
raw_obs_dim = 0
|
|
||||||
if self.config.history_length > 0:
|
|
||||||
raw_obs_dim = self.runner.env.observation_space.shape[0]
|
raw_obs_dim = self.runner.env.observation_space.shape[0]
|
||||||
|
|
||||||
|
# Privileged dimension (μ) from the runner, if available.
|
||||||
|
mu_dim = getattr(self.runner, "privileged_dim", 0)
|
||||||
|
|
||||||
self.model = SharedMLP(
|
self.model = SharedMLP(
|
||||||
observation_space=obs_space,
|
observation_space=obs_space,
|
||||||
action_space=act_space,
|
action_space=act_space,
|
||||||
@@ -193,6 +198,9 @@ class Trainer:
|
|||||||
history_length=self.config.history_length,
|
history_length=self.config.history_length,
|
||||||
raw_obs_dim=raw_obs_dim,
|
raw_obs_dim=raw_obs_dim,
|
||||||
embedding_dim=self.config.embedding_dim,
|
embedding_dim=self.config.embedding_dim,
|
||||||
|
rma_mode=self.config.rma_mode,
|
||||||
|
latent_dim=self.config.latent_dim,
|
||||||
|
mu_dim=mu_dim,
|
||||||
)
|
)
|
||||||
|
|
||||||
models = {"policy": self.model, "value": self.model}
|
models = {"policy": self.model, "value": self.model}
|
||||||
|
|||||||
@@ -1,94 +0,0 @@
|
|||||||
"""Unit tests for MuJoCoRunner domain randomization and history buffer."""
|
|
||||||
|
|
||||||
import dataclasses
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
import pytest
|
|
||||||
import torch
|
|
||||||
from gymnasium import spaces
|
|
||||||
|
|
||||||
from src.runners.mujoco import DomainRandConfig, MuJoCoRunnerConfig
|
|
||||||
from src.models.mlp import SharedMLP, HistoryEncoder
|
|
||||||
|
|
||||||
|
|
||||||
class TestDomainRandConfig:
|
|
||||||
def test_default_all_zero(self) -> None:
|
|
||||||
cfg = DomainRandConfig()
|
|
||||||
assert cfg.mass_frac == 0.0
|
|
||||||
assert cfg.friction_frac == 0.0
|
|
||||||
assert cfg.gear_frac == 0.0
|
|
||||||
|
|
||||||
def test_from_dict(self) -> None:
|
|
||||||
d = {"mass_frac": 0.15, "gear_frac": 0.1}
|
|
||||||
cfg = DomainRandConfig(**d)
|
|
||||||
assert cfg.mass_frac == 0.15
|
|
||||||
assert cfg.gear_frac == 0.1
|
|
||||||
assert cfg.damping_frac == 0.0 # not set
|
|
||||||
|
|
||||||
|
|
||||||
class TestMuJoCoRunnerConfig:
|
|
||||||
def test_default_dr_disabled(self) -> None:
|
|
||||||
cfg = MuJoCoRunnerConfig()
|
|
||||||
assert isinstance(cfg.domain_rand, DomainRandConfig)
|
|
||||||
assert cfg.domain_rand.mass_frac == 0.0
|
|
||||||
|
|
||||||
def test_domain_rand_from_dict(self) -> None:
|
|
||||||
"""Hydra passes nested configs as dicts — test __post_init__ converts."""
|
|
||||||
cfg = MuJoCoRunnerConfig(
|
|
||||||
domain_rand={"mass_frac": 0.2, "friction_frac": 0.3}, # type: ignore[arg-type]
|
|
||||||
)
|
|
||||||
assert isinstance(cfg.domain_rand, DomainRandConfig)
|
|
||||||
assert cfg.domain_rand.mass_frac == 0.2
|
|
||||||
assert cfg.domain_rand.friction_frac == 0.3
|
|
||||||
|
|
||||||
def test_history_length_default(self) -> None:
|
|
||||||
cfg = MuJoCoRunnerConfig()
|
|
||||||
assert cfg.history_length == 0
|
|
||||||
|
|
||||||
|
|
||||||
class TestHistoryEncoder:
|
|
||||||
def test_output_shape(self) -> None:
|
|
||||||
enc = HistoryEncoder(history_length=10, step_dim=7, embedding_dim=32)
|
|
||||||
x = torch.randn(4, 10, 7) # batch=4, H=10, step_dim=7
|
|
||||||
out = enc(x)
|
|
||||||
assert out.shape == (4, 32)
|
|
||||||
|
|
||||||
def test_different_embedding_dim(self) -> None:
|
|
||||||
enc = HistoryEncoder(history_length=5, step_dim=7, embedding_dim=16)
|
|
||||||
x = torch.randn(2, 5, 7)
|
|
||||||
out = enc(x)
|
|
||||||
assert out.shape == (2, 16)
|
|
||||||
|
|
||||||
|
|
||||||
class TestSharedMLPWithHistory:
|
|
||||||
def test_no_history(self) -> None:
|
|
||||||
"""Without history, model works as before."""
|
|
||||||
obs_space = spaces.Box(low=-1.0, high=1.0, shape=(6,))
|
|
||||||
act_space = spaces.Box(low=-1.0, high=1.0, shape=(1,))
|
|
||||||
model = SharedMLP(obs_space, act_space, torch.device("cpu"),
|
|
||||||
hidden_sizes=(32, 32))
|
|
||||||
assert model.history_encoder is None
|
|
||||||
inp = {"states": torch.randn(4, 6)}
|
|
||||||
mean, log_std, _ = model.compute(inp, role="policy")
|
|
||||||
assert mean.shape == (4, 1)
|
|
||||||
|
|
||||||
def test_with_history(self) -> None:
|
|
||||||
"""With history, model splits obs and encodes history."""
|
|
||||||
raw_obs_dim = 6
|
|
||||||
act_dim = 1
|
|
||||||
H = 10
|
|
||||||
step_dim = raw_obs_dim + act_dim # 7
|
|
||||||
aug_dim = raw_obs_dim + H * step_dim # 6 + 70 = 76
|
|
||||||
|
|
||||||
obs_space = spaces.Box(low=-1.0, high=1.0, shape=(aug_dim,))
|
|
||||||
act_space = spaces.Box(low=-1.0, high=1.0, shape=(act_dim,))
|
|
||||||
model = SharedMLP(obs_space, act_space, torch.device("cpu"),
|
|
||||||
hidden_sizes=(32, 32),
|
|
||||||
history_length=H, raw_obs_dim=raw_obs_dim,
|
|
||||||
embedding_dim=32)
|
|
||||||
assert model.history_encoder is not None
|
|
||||||
inp = {"states": torch.randn(4, aug_dim)}
|
|
||||||
mean, log_std, _ = model.compute(inp, role="policy")
|
|
||||||
assert mean.shape == (4, act_dim)
|
|
||||||
value, _ = model.compute(inp, role="value")
|
|
||||||
assert value.shape == (4, 1)
|
|
||||||
|
|||||||
Reference in New Issue
Block a user