Compare commits

...

3 Commits

3 changed files with 25 additions and 5 deletions

View File

@@ -16,6 +16,10 @@ If you are using DeepSpeed, please install DeepSpeed with `pip install deepspeed
### Recent Updates
Jul 21, 2025:
- Support for [Lumina-Image 2.0](https://github.com/Alpha-VLLM/Lumina-Image-2.0) has been added in PR [#1927](https://github.com/kohya-ss/sd-scripts/pull/1927) and [#2138](https://github.com/kohya-ss/sd-scripts/pull/2138). Special thanks to sdbds and RockerBOO for their contributions.
- Please refer to the [Lumina-Image 2.0 documentation](./docs/lumina_train_network.md) for more details.
Jul 10, 2025:
- [AI Coding Agents](#for-developers-using-ai-coding-agents) section is added to the README. This section provides instructions for developers using AI coding agents like Claude and Gemini to understand the project context and coding standards.

View File

@@ -1,5 +1,3 @@
Status: reviewed
# LoRA Training Guide for Lumina Image 2.0 using `lumina_train_network.py` / `lumina_train_network.py` を用いたLumina Image 2.0モデルのLoRA学習ガイド
This document explains how to train LoRA (Low-Rank Adaptation) models for Lumina Image 2.0 using `lumina_train_network.py` in the `sd-scripts` repository.
@@ -198,6 +196,7 @@ For Lumina Image 2.0, you can specify different dimensions for various component
<details>
<summary>日本語</summary>
[`train_network.py`のガイド](train_network.md)で説明されている引数に加え、以下のLumina Image 2.0特有の引数を指定します。共通の引数については、上記ガイドを参照してください。
#### モデル関連
@@ -250,6 +249,18 @@ After setting the required arguments, run the command to begin training. The ove
When training finishes, a LoRA model file (e.g. `my_lumina_lora.safetensors`) is saved in the directory specified by `output_dir`. Use this file with inference environments that support Lumina Image 2.0, such as ComfyUI with appropriate nodes.
### Inference with scripts in this repository / このリポジトリのスクリプトを使用した推論
The inference script is also available. The script is `lumina_minimal_inference.py`. See `--help` for options.
```
python lumina_minimal_inference.py --pretrained_model_name_or_path path/to/lumina.safetensors --gemma2_path path/to/gemma.safetensors" --ae_path path/to/flux_ae.safetensors --output_dir path/to/output_dir --offload --seed 1234 --prompt "Positive prompt" --system_prompt "You are an assistant designed to generate high-quality images based on user prompts." --negative_prompt "negative prompt"
```
`--add_system_prompt_to_negative_prompt` option can be used to add the system prompt to the negative prompt.
`--lora_weights` option can be used to specify the LoRA weights file, and optional multiplier (like `path;1.0`).
## 6. Others / その他
`lumina_train_network.py` shares many features with `train_network.py`, such as sample image generation (`--sample_prompts`, etc.) and detailed optimizer settings. For these, see the [train_network.py guide](train_network.md#5-other-features--その他の機能) or run `python lumina_train_network.py --help`.
@@ -279,6 +290,8 @@ Sample prompts can include CFG truncate (`--ctr`) and Renorm CFG (`-rcfg`) param
学習が完了すると、指定した`output_dir`にLoRAモデルファイル例: `my_lumina_lora.safetensors`が保存されます。このファイルは、Lumina Image 2.0モデルに対応した推論環境(例: ComfyUI + 適切なノード)で使用できます。
当リポジトリ内の推論スクリプトを用いて推論することも可能です。スクリプトは`lumina_minimal_inference.py`です。オプションは`--help`で確認できます。記述例は英語版のドキュメントをご確認ください。
`lumina_train_network.py`には、サンプル画像の生成 (`--sample_prompts`など) や詳細なオプティマイザ設定など、`train_network.py`と共通の機能も多く存在します。これらについては、[`train_network.py`のガイド](train_network.md#5-other-features--その他の機能)やスクリプトのヘルプ (`python lumina_train_network.py --help`) を参照してください。
### 6.1. 推奨設定

View File

@@ -48,7 +48,7 @@ def generate_image(
steps: int,
guidance_scale: float,
negative_prompt: Optional[str],
args,
args: argparse.Namespace,
cfg_trunc_ratio: float = 0.25,
renorm_cfg: float = 1.0,
):
@@ -88,7 +88,9 @@ def generate_image(
with torch.no_grad():
gemma2_conds = encoding_strategy.encode_tokens(tokenize_strategy, [gemma2], tokens_and_masks)
tokens_and_masks = tokenize_strategy.tokenize(negative_prompt, is_negative=True)
tokens_and_masks = tokenize_strategy.tokenize(
negative_prompt, is_negative=True and not args.add_system_prompt_to_negative_prompt
)
with torch.no_grad():
neg_gemma2_conds = encoding_strategy.encode_tokens(tokenize_strategy, [gemma2], tokens_and_masks)
@@ -215,6 +217,7 @@ def setup_parser() -> argparse.ArgumentParser:
parser.add_argument("--device", type=str, default=None, help="Device to use (e.g., 'cuda:0')")
parser.add_argument("--offload", action="store_true", help="Offload models to CPU to save VRAM")
parser.add_argument("--system_prompt", type=str, default="", help="System prompt for Gemma2 model")
parser.add_argument("--add_system_prompt_to_negative_prompt", action="store_true", help="Add system prompt to negative prompt")
parser.add_argument(
"--gemma2_max_token_length",
type=int,
@@ -231,7 +234,7 @@ def setup_parser() -> argparse.ArgumentParser:
"--cfg_trunc_ratio",
type=float,
default=0.25,
help="The ratio of the timestep interval to apply normalization-based guidance scale. For example, 0.25 means the first 25% of timesteps will be guided.",
help="The ratio of the timestep interval to apply normalization-based guidance scale. For example, 0.25 means the first 25%% of timesteps will be guided.",
)
parser.add_argument(
"--renorm_cfg",