diff --git a/docs/anima_train_network.md b/docs/anima_train_network.md
index e61d906d..f97aa975 100644
--- a/docs/anima_train_network.md
+++ b/docs/anima_train_network.md
@@ -48,7 +48,7 @@ Qwen-Image VAEとQwen-Image VAEは同じアーキテクチャですが、[Anima
* **Arguments:** Uses the common `--pretrained_model_name_or_path` for the DiT model path, `--qwen3` for the Qwen3 text encoder, and `--vae` for the Qwen-Image VAE. The LLM adapter and T5 tokenizer can be specified separately with `--llm_adapter_path` and `--t5_tokenizer_path`.
* **Incompatible arguments:** Stable Diffusion v1/v2 options such as `--v2`, `--v_parameterization` and `--clip_skip` are not used. `--fp8_base` is not supported.
* **Timestep sampling:** Uses the same `--timestep_sampling` options as FLUX training (`sigma`, `uniform`, `sigmoid`, `shift`, `flux_shift`).
-* **LoRA:** Uses regex-based module selection and per-module rank/alpha/learning rate control (`network_reg_dims`, `network_reg_alphas`, `network_reg_lrs`) instead of per-component arguments. Module exclusion/inclusion is controlled by `exclude_patterns` and `include_patterns`.
+* **LoRA:** Uses regex-based module selection and per-module rank/learning rate control (`network_reg_dims`, `network_reg_lrs`) instead of per-component arguments. Module exclusion/inclusion is controlled by `exclude_patterns` and `include_patterns`.
日本語
@@ -60,7 +60,7 @@ Qwen-Image VAEとQwen-Image VAEは同じアーキテクチャですが、[Anima
* **引数:** DiTモデルのパスには共通引数`--pretrained_model_name_or_path`を、Qwen3テキストエンコーダーには`--qwen3`を、Qwen-Image VAEには`--vae`を使用します。LLM AdapterとT5トークナイザーはそれぞれ`--llm_adapter_path`、`--t5_tokenizer_path`で個別に指定できます。
* **一部引数の非互換性:** Stable Diffusion v1/v2向けの引数(例: `--v2`, `--v_parameterization`, `--clip_skip`)は使用されません。`--fp8_base`はサポートされていません。
* **タイムステップサンプリング:** FLUX学習と同じ`--timestep_sampling`オプション(`sigma`、`uniform`、`sigmoid`、`shift`、`flux_shift`)を使用します。
-* **LoRA:** コンポーネント別の引数の代わりに、正規表現ベースのモジュール選択とモジュール単位のランク/アルファ/学習率制御(`network_reg_dims`、`network_reg_alphas`、`network_reg_lrs`)を使用します。モジュールの除外/包含は`exclude_patterns`と`include_patterns`で制御します。
+* **LoRA:** コンポーネント別の引数の代わりに、正規表現ベースのモジュール選択とモジュール単位のランク/学習率制御(`network_reg_dims`、`network_reg_lrs`)を使用します。モジュールの除外/包含は`exclude_patterns`と`include_patterns`で制御します。
## 3. Preparation / 準備
@@ -225,93 +225,7 @@ For LoRA training, use `network_reg_lrs` in `--network_args` instead. See [Secti
- Chunk size for Qwen-Image VAE processing. Reduces VRAM usage at the cost of speed. Default is no chunking.
* `--vae_disable_cache`
- Disable internal caching in Qwen-Image VAE to reduce VRAM usage.
-
-#### EMA (Exponential Moving Average) / EMA (指数移動平均)
-
-EMA maintains a shadow copy of the model parameters, averaging them over training steps. This produces smoother, more stable weights that often generalize better than the final training checkpoint. EMA is supported for both full fine-tuning (`anima_train.py`) and LoRA training (`anima_train_network.py`).
-
-* `--ema`
- - Enable EMA. When enabled, an EMA model is saved alongside each regular checkpoint with an `ema_` prefix on the filename (e.g., `ema_anima-000010.safetensors`). The EMA model has the same format as the regular model and can be used directly for inference.
-* `--ema_decay=` (default: `0.9999`)
- - Decay rate for EMA. Higher values produce smoother weights but adapt more slowly to new training data. Typical values range from `0.999` to `0.99999`.
-* `--ema_device=` (default: `cuda`)
- - Device to store EMA shadow parameters. Choose `cuda` or `cpu`. Using `cpu` significantly reduces GPU VRAM usage (shadow params use the same amount of memory as the model) but makes EMA updates slower due to CPU-GPU data transfer.
-* `--ema_use_num_updates`
- - Automatically adjust the EMA decay based on the number of update steps. The effective decay is calculated as `min(decay, (1 + num_updates) / (10 + num_updates))`. This makes the EMA warm up faster in early training steps.
-* `--ema_sample`
- - Enable dual sampling: generate sample images with both training weights and EMA weights side by side. EMA sample images are saved with a `_ema` suffix (e.g., `image_0000_000010_ema.png`). EMA sampling is skipped at step 0 since EMA hasn't accumulated meaningful averages yet. This option works with the existing `--sample_every_n_steps`, `--sample_every_n_epochs`, and `--sample_prompts` arguments.
-* `--ema_resume_path=` *[Optional]*
- - Path to a previously saved EMA model (`.safetensors`) to resume EMA from. For full fine-tuning, the file should be a saved EMA DiT model. For LoRA training, the file should be a saved EMA LoRA file.
-* `--ema_use_feedback` *[Experimental]*
- - Feed back EMA parameters into the training model after each update. This is an experimental feature and is **not compatible with multi-GPU DDP training** (it modifies parameters only on the main process, causing parameter desynchronization across GPUs).
-* `--ema_param_multiplier=` (default: `1.0`) *[Experimental]*
- - Multiply shadow parameters by this value after each EMA update. This is an experimental feature and is **not compatible with multi-GPU DDP training** when set to a value other than `1.0`.
-
-**Example — LoRA training with EMA:**
-
-```bash
-accelerate launch --num_cpu_threads_per_process 1 anima_train_network.py \
- --pretrained_model_name_or_path="" \
- --qwen3="" \
- --vae="" \
- --dataset_config="my_anima_dataset_config.toml" \
- --output_dir="