From e7b8e9a7784c042a83a15aba76d05c8b186db6d8 Mon Sep 17 00:00:00 2001 From: Kohya S <52813779+kohya-ss@users.noreply.github.com> Date: Sun, 21 Sep 2025 11:13:26 +0900 Subject: [PATCH] doc: add --vae_chunk_size option for training and inference --- docs/hunyuan_image_train_network.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/hunyuan_image_train_network.md b/docs/hunyuan_image_train_network.md index d31ff867..658a7beb 100644 --- a/docs/hunyuan_image_train_network.md +++ b/docs/hunyuan_image_train_network.md @@ -192,8 +192,8 @@ The script adds HunyuanImage-2.1 specific arguments. For common arguments (like - Caches the outputs of Qwen2.5-VL and byT5. This reduces memory usage. * `--cache_latents`, `--cache_latents_to_disk` - Caches the outputs of VAE. Similar functionality to [sdxl_train_network.py](sdxl_train_network.md). -* `--vae_enable_tiling` - - Enables tiling for VAE encoding and decoding to reduce VRAM usage. +* `--vae_chunk_size=` + - Enables chunked processing in the VAE to reduce VRAM usage during encoding and decoding. Specify the chunk size as an integer (e.g., `16`). Larger values use more VRAM but are faster. Default is `None` (no chunking). This option is useful when VRAM is limited (e.g., 8GB or 12GB).
日本語 @@ -453,6 +453,7 @@ python hunyuan_image_minimal_inference.py \ - `--guidance_scale`: CFG scale (default: 3.5) - `--flow_shift`: Flow matching shift parameter (default: 5.0) - `--text_encoder_cpu`: Run the text encoders on CPU to reduce VRAM usage +- `--vae_chunk_size`: Chunk size for VAE decoding to reduce memory usage (default: None, no chunking). 16 is recommended if enabled. `--split_attn` is not supported (since inference is done one at a time). `--fp8_vl` is not supported, please use CPU for the text encoder if VRAM is insufficient. @@ -468,6 +469,7 @@ python hunyuan_image_minimal_inference.py \ - `--guidance_scale`: CFGスケール(推奨: 3.5) - `--flow_shift`: Flow Matchingシフトパラメータ(デフォルト: 5.0) - `--text_encoder_cpu`: テキストエンコーダをCPUで実行してVRAM使用量削減 +- `--vae_chunk_size`: VAEデコーディングのチャンクサイズ(デフォルト: None、チャンク処理なし)。有効にする場合は16を推奨。 `--split_attn`はサポートされていません(1件ずつ推論するため)。`--fp8_vl`もサポートされていません。VRAMが不足する場合はテキストエンコーダをCPUで実行してください。