From 88960e63094bcb96fae318c526867fe409fade18 Mon Sep 17 00:00:00 2001
From: Kohya S <ykumeykume@gmail.com>
Date: Sun, 13 Jul 2025 20:49:38 +0900
Subject: [PATCH] doc: update lumina LoRA training guide

---
 docs/lumina_train_network.md | 42 ++++++++++++++++--------------------
 library/lumina_train_util.py |  4 ++--
 2 files changed, 20 insertions(+), 26 deletions(-)
diff --git a/docs/lumina_train_network.md b/docs/lumina_train_network.md
index 2872f513..e811f68b 100644
--- a/docs/lumina_train_network.md
+++ b/docs/lumina_train_network.md
@@ -8,12 +8,12 @@ This document explains how to train LoRA (Low-Rank Adaptation) models for Lumina
 
 `lumina_train_network.py` trains additional networks such as LoRA for Lumina Image 2.0 models. Lumina Image 2.0 adopts a Next-DiT (Next-generation Diffusion Transformer) architecture, which differs from previous Stable Diffusion models. It uses a single text encoder (Gemma2) and a dedicated AutoEncoder (AE).
 
-This guide assumes you already understand the basics of LoRA training. For common usage and options, see the [train_network.py guide](train_network.md). Some parameters are similar to those in [`sd3_train_network.py`](sd3_train_network.md) and [`flux_train_network.py`](flux_train_network.md).
+This guide assumes you already understand the basics of LoRA training. For common usage and options, see the train_network.py guide (to be documented). Some parameters are similar to those in [`sd3_train_network.py`](sd3_train_network.md) and [`flux_train_network.py`](flux_train_network.md).
 
 **Prerequisites:**
 
 * The `sd-scripts` repository has been cloned and the Python environment is ready.
-* A training dataset has been prepared. See the [Dataset Configuration Guide](link/to/dataset/config/doc).
+* A training dataset has been prepared. See the [Dataset Configuration Guide](./config_README-en.md).
 * Lumina Image 2.0 model files for training are available.
 
 <details>
@@ -21,12 +21,12 @@ This guide assumes you already understand the basics of LoRA training. For commo
 
 `lumina_train_network.py`は、Lumina Image 2.0モデルに対してLoRAなどの追加ネットワークを学習させるためのスクリプトです。Lumina Image 2.0は、Next-DiT (Next-generation Diffusion Transformer) と呼ばれる新しいアーキテクチャを採用しており、従来のStable Diffusionモデルとは構造が異なります。テキストエンコーダーとしてGemma2を単体で使用し、専用のAutoEncoder (AE) を使用します。
 
-このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](train_network.md)を参照してください。また一部のパラメータは [`sd3_train_network.py`](sd3_train_network.md) や [`flux_train_network.py`](flux_train_network.md) と同様のものがあるため、そちらも参考にしてください。
+このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象としています。基本的な使い方や共通のオプションについては、`train_network.py`のガイド（作成中）を参照してください。また一部のパラメータは [`sd3_train_network.py`](sd3_train_network.md) や [`flux_train_network.py`](flux_train_network.md) と同様のものがあるため、そちらも参考にしてください。
 
 **前提条件:**
 
 *   `sd-scripts`リポジトリのクローンとPython環境のセットアップが完了していること。
-*   学習用データセットの準備が完了していること。（データセットの準備については[データセット設定ガイド](link/to/dataset/config/doc)を参照してください）
+*   学習用データセットの準備が完了していること。（データセットの準備については[データセット設定ガイド](./config_README-en.md)を参照してください）
 *   学習対象のLumina Image 2.0モデルファイルが準備できていること。
 </details>
 
@@ -59,7 +59,14 @@ The following files are required before starting training:
 2. **Lumina Image 2.0 model file:** `.safetensors` file for the base model.
 3. **Gemma2 text encoder file:** `.safetensors` file for the text encoder.
 4. **AutoEncoder (AE) file:** `.safetensors` file for the AE.
-5. **Dataset definition file (.toml):** Dataset settings in TOML format. (See the [Dataset Configuration Guide](link/to/dataset/config/doc).) In this document we use `my_lumina_dataset_config.toml` as an example.
+5. **Dataset definition file (.toml):** Dataset settings in TOML format. (See the [Dataset Configuration Guide](./config_README-en.md). In this document we use `my_lumina_dataset_config.toml` as an example.
+
+
+**Model Files:**
+* Lumina Image 2.0: `lumina-image-2.safetensors` ([full precision link](https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors)) or `lumina_2_model_bf16.safetensors` ([bf16 link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/diffusion_models/lumina_2_model_bf16.safetensors))
+* Gemma2 2B (fp16): `gemma-2-2b.safetensors` ([link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/text_encoders/gemma_2_2b_fp16.safetensors))
+* AutoEncoder: `ae.safetensors` ([link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/vae/ae.safetensors)) (same as FLUX)
+
 
 <details>
 <summary>日本語</summary>
@@ -69,8 +76,11 @@ The following files are required before starting training:
 2.  **Lumina Image 2.0モデルファイル:** 学習のベースとなるLumina Image 2.0モデルの`.safetensors`ファイル。
 3.  **Gemma2テキストエンコーダーファイル:** Gemma2テキストエンコーダーの`.safetensors`ファイル。
 4.  **AutoEncoder (AE) ファイル:** AEの`.safetensors`ファイル。
-5.  **データセット定義ファイル (.toml):** 学習データセットの設定を記述したTOML形式のファイル。（詳細は[データセット設定ガイド](link/to/dataset/config/doc)を参照してください）。
+5.  **データセット定義ファイル (.toml):** 学習データセットの設定を記述したTOML形式のファイル。（詳細は[データセット設定ガイド](./config_README-en.md)を参照してください）。
     *   例として`my_lumina_dataset_config.toml`を使用します。
+
+**モデルファイル** は英語ドキュメントの通りです。
+
 </details>
 
 ## 4. Running the Training / 学習の実行
@@ -97,7 +107,6 @@ accelerate launch --num_cpu_threads_per_process 1 lumina_train_network.py \
   --timestep_sampling="nextdit_shift" \
   --discrete_flow_shift=6.0 \
   --model_prediction_type="raw" \
-  --guidance_scale=4.0 \
   --system_prompt="You are an assistant designed to generate high-quality images based on user prompts." \
   --max_train_epochs=10 \
   --save_every_n_epochs=1 \
@@ -133,7 +142,6 @@ accelerate launch --num_cpu_threads_per_process 1 lumina_train_network.py \
   --timestep_sampling="nextdit_shift" \
   --discrete_flow_shift=6.0 \
   --model_prediction_type="raw" \
-  --guidance_scale=4.0 \
   --system_prompt="You are an assistant designed to generate high-quality images based on user prompts." \
   --max_train_epochs=10 \
   --save_every_n_epochs=1 \
@@ -158,11 +166,10 @@ Besides the arguments explained in the [train_network.py guide](train_network.md
 
 #### Lumina Image 2.0 Training Parameters / Lumina Image 2.0 学習パラメータ
 
-* `--gemma2_max_token_length=<integer>` – Max token length for Gemma2. Default varies by model.
+* `--gemma2_max_token_length=<integer>` – Max token length for Gemma2. Default is 256.
 * `--timestep_sampling=<choice>` – Timestep sampling method. Options: `sigma`, `uniform`, `sigmoid`, `shift`, `nextdit_shift`. Default `sigma`. **Recommended: `nextdit_shift`**
 * `--discrete_flow_shift=<float>` – Discrete flow shift for the Euler Discrete Scheduler. Default `6.0`.
 * `--model_prediction_type=<choice>` – Model prediction processing method. Options: `raw`, `additive`, `sigma_scaled`. Default `sigma_scaled`. **Recommended: `raw`**
-* `--guidance_scale=<float>` – Guidance scale for training. **Recommended: `4.0`**
 * `--system_prompt=<string>` – System prompt to prepend to all prompts. Recommended: `"You are an assistant designed to generate high-quality images based on user prompts."` or `"You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts."`
 * `--use_flash_attn` – Use Flash Attention. Requires `pip install flash-attn` (may not be supported in all environments). If installed correctly, it speeds up training. 
 * `--sigmoid_scale=<float>` – Scale factor for sigmoid timestep sampling. Default `1.0`.
@@ -204,11 +211,10 @@ For Lumina Image 2.0, you can specify different dimensions for various component
 
 #### Lumina Image 2.0 学習パラメータ
 
-*   `--gemma2_max_token_length=<integer>` – Gemma2で使用するトークンの最大長を指定します。デフォルトはモデルによって異なります。
+*   `--gemma2_max_token_length=<integer>` – Gemma2で使用するトークンの最大長を指定します。デフォルトは256です。
 *   `--timestep_sampling=<choice>` – タイムステップのサンプリング方法を指定します。`sigma`, `uniform`, `sigmoid`, `shift`, `nextdit_shift`から選択します。デフォルトは`sigma`です。**推奨: `nextdit_shift`**
 *   `--discrete_flow_shift=<float>` – Euler Discrete Schedulerの離散フローシフトを指定します。デフォルトは`6.0`です。
 *   `--model_prediction_type=<choice>` – モデル予測の処理方法を指定します。`raw`, `additive`, `sigma_scaled`から選択します。デフォルトは`sigma_scaled`です。**推奨: `raw`**
-*   `--guidance_scale=<float>` – 学習時のガイダンススケールを指定します。**推奨: `4.0`**
 *   `--system_prompt=<string>` – 全てのプロンプトに前置するシステムプロンプトを指定します。推奨: `"You are an assistant designed to generate high-quality images based on user prompts."` または `"You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts."`
 *   `--use_flash_attn` – Flash Attentionを使用します。`pip install flash-attn`でインストールが必要です（環境によってはサポートされていません）。正しくインストールされている場合は、指定すると学習が高速化されます。
 *   `--sigmoid_scale=<float>` – sigmoidタイムステップサンプリングのスケール係数を指定します。デフォルトは`1.0`です。
@@ -252,16 +258,10 @@ When training finishes, a LoRA model file (e.g. `my_lumina_lora.safetensors`) is
 
 Based on the contributor's recommendations, here are the suggested settings for optimal training:
 
-**Model Files:**
-* Lumina Image 2.0: `lumina-image-2.safetensors` ([full precision link](https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors)) or `lumina_2_model_bf16.safetensors` ([bf16 link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/diffusion_models/lumina_2_model_bf16.safetensors))
-* Gemma2 2B (fp16): `gemma-2-2b.safetensors` ([link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/text_encoders/gemma_2_2b_fp16.safetensors))
-* AutoEncoder: `ae.safetensors` ([link](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/vae/ae.safetensors)) (same as FLUX)
-
 **Key Parameters:**
 * `--timestep_sampling="nextdit_shift"`
 * `--discrete_flow_shift=6.0`
 * `--model_prediction_type="raw"`
-* `--guidance_scale=4.0`
 * `--mixed_precision="bf16"`
 
 **System Prompts:**
@@ -284,16 +284,10 @@ Sample prompts can include CFG truncate (`-ct`) and Renorm CFG (`-rc`) parameter
 
 コントリビューターの推奨に基づく、最適な学習のための推奨設定：
 
-**モデルファイル:**
-* Lumina Image 2.0: `lumina-image-2.safetensors` ([full precisionリンク](https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors)) または `lumina_2_model_bf16.safetensors` ([bf16リンク](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/diffusion_models/lumina_2_model_bf16.safetensors))
-* Gemma2 2B (fp16): `gemma-2-2b.safetensors` ([リンク](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/text_encoders/gemma_2_2b_fp16.safetensors))
-* AutoEncoder: `ae.safetensors` ([リンク](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/vae/ae.safetensors)) (FLUXと同じ)
-
 **主要パラメータ:**
 * `--timestep_sampling="nextdit_shift"`
 * `--discrete_flow_shift=6.0`
 * `--model_prediction_type="raw"`
-* `--guidance_scale=4.0`
 * `--mixed_precision="bf16"`
 
 **システムプロンプト:**
diff --git a/library/lumina_train_util.py b/library/lumina_train_util.py
index 45f22bc4..1cf9278a 100644
--- a/library/lumina_train_util.py
+++ b/library/lumina_train_util.py
@@ -1042,8 +1042,8 @@ def add_lumina_train_arguments(parser: argparse.ArgumentParser):
         "--gemma2_max_token_length",
         type=int,
         default=None,
-        help="maximum token length for Gemma2. if omitted, 256 for schnell and 512 for dev"
-        " / Gemma2の最大トークン長。省略された場合、schnellの場合は256、devの場合は512",
+        help="maximum token length for Gemma2. if omitted, 256"
+        " / Gemma2の最大トークン長。省略された場合、256になります",
     )
 
     parser.add_argument(