From a0c26a0efac8c56905153bee8870bcfbb6f96731 Mon Sep 17 00:00:00 2001 From: kohya-ss <52813779+kohya-ss@users.noreply.github.com> Date: Sun, 28 Sep 2025 18:21:25 +0900 Subject: [PATCH] docs: enhance text encoder CPU usage instructions for HunyuanImage-2.1 training --- docs/hunyuan_image_train_network.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hunyuan_image_train_network.md b/docs/hunyuan_image_train_network.md index b2bf113d..b0e9cdd9 100644 --- a/docs/hunyuan_image_train_network.md +++ b/docs/hunyuan_image_train_network.md @@ -190,7 +190,7 @@ The script adds HunyuanImage-2.1 specific arguments. For common arguments (like * `--fp8_vl` - Use FP8 for the VLM (Qwen2.5-VL) text encoder. * `--text_encoder_cpu` - - Runs the text encoders on CPU to reduce VRAM usage. This is useful when VRAM is insufficient (less than 12GB). Encoding one text may take a few minutes (depending on CPU). It is highly recommended to use this option with `--cache_text_encoder_outputs_to_disk` to avoid repeated encoding every time training starts. + - Runs the text encoders on CPU to reduce VRAM usage. This is useful when VRAM is insufficient (less than 12GB). Encoding one text may take a few minutes (depending on CPU). It is highly recommended to use this option with `--cache_text_encoder_outputs_to_disk` to avoid repeated encoding every time training starts. **In addition, increasing `--num_cpu_threads_per_process` in the `accelerate launch` command, like `--num_cpu_threads_per_process=8` or `16`, can speed up encoding in some environments.** * `--blocks_to_swap=` **[Experimental Feature]** - Setting to reduce VRAM usage by swapping parts of the model (Transformer blocks) between CPU and GPU. Specify the number of blocks to swap as an integer (e.g., `18`). Larger values reduce VRAM usage but decrease training speed. Adjust according to your GPU's VRAM capacity. Can be used with `gradient_checkpointing`. * `--cache_text_encoder_outputs`