From 62db7eb2055cedea66dd32f8e32d39e768f833f4 Mon Sep 17 00:00:00 2001
From: Kohya S <52813779+kohya-ss@users.noreply.github.com>
Date: Mon, 23 Feb 2026 18:44:24 +0900
Subject: [PATCH] doc: add documentation for LoHa and LoKr fine-tuning methods

---
 docs/loha_lokr.md | 359 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 359 insertions(+)
 create mode 100644 docs/loha_lokr.md
diff --git a/docs/loha_lokr.md b/docs/loha_lokr.md
new file mode 100644
index 00000000..40289b25
--- /dev/null
+++ b/docs/loha_lokr.md
@@ -0,0 +1,359 @@
+> 📝 Click on the language section to expand / 言語をクリックして展開
+
+# LoHa / LoKr (LyCORIS)
+
+## Overview / 概要
+
+In addition to standard LoRA, sd-scripts supports **LoHa** (Low-rank Hadamard Product) and **LoKr** (Low-rank Kronecker Product) as alternative parameter-efficient fine-tuning methods. These are based on techniques from the [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) project.
+
+- **LoHa**: Represents weight updates as a Hadamard (element-wise) product of two low-rank matrices. Reference: [FedPara (arXiv:2108.06098)](https://arxiv.org/abs/2108.06098)
+- **LoKr**: Represents weight updates as a Kronecker product with optional low-rank decomposition. Reference: [LoKr (arXiv:2309.14859)](https://arxiv.org/abs/2309.14859)
+
+The algorithms and recommended settings are described in the [LyCORIS documentation](https://github.com/KohakuBlueleaf/LyCORIS/blob/main/docs/Algo-List.md) and [guidelines](https://github.com/KohakuBlueleaf/LyCORIS/blob/main/docs/Guidelines.md).
+
+Both methods target Linear and Conv2d layers. Conv2d 1x1 layers are treated similarly to Linear layers. For Conv2d 3x3+ layers, optional Tucker decomposition or flat (kernel-flattened) mode is available.
+
+This feature is experimental.
+
+<details>
+<summary>日本語</summary>
+
+sd-scriptsでは、標準的なLoRAに加え、代替のパラメータ効率の良いファインチューニング手法として **LoHa**（Low-rank Hadamard Product）と **LoKr**（Low-rank Kronecker Product）をサポートしています。これらは [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) プロジェクトの手法に基づいています。
+
+- **LoHa**: 重みの更新を2つの低ランク行列のHadamard積（要素ごとの積）で表現します。参考文献: [FedPara (arXiv:2108.06098)](https://arxiv.org/abs/2108.06098)
+- **LoKr**: 重みの更新をKronecker積と、オプションの低ランク分解で表現します。参考文献: [LoKr (arXiv:2309.14859)](https://arxiv.org/abs/2309.14859)
+
+アルゴリズムと推奨設定は[LyCORISのアルゴリズム解説](https://github.com/KohakuBlueleaf/LyCORIS/blob/main/docs/Algo-List.md)と[ガイドライン](https://github.com/KohakuBlueleaf/LyCORIS/blob/main/docs/Guidelines.md)を参照してください。
+
+LinearおよびConv2d層の両方を対象としています。Conv2d 1x1層はLinear層と同様に扱われます。Conv2d 3x3+層については、オプションのTucker分解またはflat（カーネル平坦化）モードが利用可能です。
+
+この機能は実験的なものです。
+
+</details>
+
+## Acknowledgments / 謝辞
+
+The LoHa and LoKr implementations in sd-scripts are based on the [LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS) project by [KohakuBlueleaf](https://github.com/KohakuBlueleaf). We would like to express our sincere gratitude for the excellent research and open-source contributions that made this implementation possible.
+
+<details>
+<summary>日本語</summary>
+
+sd-scriptsのLoHaおよびLoKrの実装は、[KohakuBlueleaf](https://github.com/KohakuBlueleaf)氏による[LyCORIS](https://github.com/KohakuBlueleaf/LyCORIS)プロジェクトに基づいています。この実装を可能にしてくださった素晴らしい研究とオープンソースへの貢献に心から感謝いたします。
+
+</details>
+
+## Supported architectures / 対応アーキテクチャ
+
+LoHa and LoKr automatically detect the model architecture and apply appropriate default settings. The following architectures are currently supported:
+
+- **SDXL**: Targets `Transformer2DModel` for UNet and `CLIPAttention`/`CLIPMLP` for text encoders. Conv2d layers in `ResnetBlock2D`, `Downsample2D`, and `Upsample2D` are also supported when `conv_dim` is specified. No default `exclude_patterns`.
+- **Anima**: Targets `Block`, `PatchEmbed`, `TimestepEmbedding`, and `FinalLayer` for DiT, and `Qwen3Attention`/`Qwen3MLP` for the text encoder. Default `exclude_patterns` automatically skips modulation, normalization, embedder, and final_layer modules.
+
+<details>
+<summary>日本語</summary>
+
+LoHaとLoKrは、モデルのアーキテクチャを自動で検出し、適切なデフォルト設定を適用します。現在、以下のアーキテクチャに対応しています:
+
+- **SDXL**: UNetの`Transformer2DModel`、テキストエンコーダの`CLIPAttention`/`CLIPMLP`を対象とします。`conv_dim`を指定した場合、`ResnetBlock2D`、`Downsample2D`、`Upsample2D`のConv2d層も対象になります。デフォルトの`exclude_patterns`はありません。
+- **Anima**: DiTの`Block`、`PatchEmbed`、`TimestepEmbedding`、`FinalLayer`、テキストエンコーダの`Qwen3Attention`/`Qwen3MLP`を対象とします。デフォルトの`exclude_patterns`により、modulation、normalization、embedder、final_layerモジュールは自動的にスキップされます。
+
+</details>
+
+## Training / 学習
+
+To use LoHa or LoKr, change the `--network_module` argument in your training command. All other training options (dataset config, optimizer, etc.) remain the same as LoRA.
+
+<details>
+<summary>日本語</summary>
+
+LoHaまたはLoKrを使用するには、学習コマンドの `--network_module` 引数を変更します。その他の学習オプション（データセット設定、オプティマイザなど）はLoRAと同じです。
+
+</details>
+
+### LoHa (SDXL)
+
+```bash
+accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 sdxl_train_network.py \
+    --pretrained_model_name_or_path path/to/sdxl.safetensors \
+    --dataset_config path/to/toml \
+    --mixed_precision bf16 --fp8_base \
+    --optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing \
+    --network_module networks.loha --network_dim 32 --network_alpha 16 \
+    --max_train_epochs 16 --save_every_n_epochs 1 \
+    --output_dir path/to/output --output_name my-loha
+```
+
+### LoKr (SDXL)
+
+```bash
+accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 sdxl_train_network.py \
+    --pretrained_model_name_or_path path/to/sdxl.safetensors \
+    --dataset_config path/to/toml \
+    --mixed_precision bf16 --fp8_base \
+    --optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing \
+    --network_module networks.lokr --network_dim 32 --network_alpha 16 \
+    --max_train_epochs 16 --save_every_n_epochs 1 \
+    --output_dir path/to/output --output_name my-lokr
+```
+
+For Anima, replace `sdxl_train_network.py` with `anima_train_network.py` and use the appropriate model path and options.
+
+<details>
+<summary>日本語</summary>
+
+Animaの場合は、`sdxl_train_network.py` を `anima_train_network.py` に置き換え、適切なモデルパスとオプションを使用してください。
+
+</details>
+
+### Common training options / 共通の学習オプション
+
+The following `--network_args` options are available for both LoHa and LoKr, same as LoRA:
+
+| Option | Description |
+|---|---|
+| `verbose=True` | Display detailed information about the network modules |
+| `rank_dropout=0.1` | Apply dropout to the rank dimension during training |
+| `module_dropout=0.1` | Randomly skip entire modules during training |
+| `exclude_patterns=[r'...']` | Exclude modules matching the regex patterns (in addition to architecture defaults) |
+| `include_patterns=[r'...']` | Include only modules matching the regex patterns |
+| `network_reg_lrs=regex1=lr1,regex2=lr2` | Set per-module learning rates using regex patterns |
+| `network_reg_dims=regex1=dim1,regex2=dim2` | Set per-module dimensions (rank) using regex patterns |
+
+<details>
+<summary>日本語</summary>
+
+以下の `--network_args` オプションは、LoRAと同様にLoHaとLoKrの両方で使用できます:
+
+| オプション | 説明 |
+|---|---|
+| `verbose=True` | ネットワークモジュールの詳細情報を表示 |
+| `rank_dropout=0.1` | 学習時にランク次元にドロップアウトを適用 |
+| `module_dropout=0.1` | 学習時にモジュール全体をランダムにスキップ |
+| `exclude_patterns=[r'...']` | 正規表現パターンに一致するモジュールを除外（アーキテクチャのデフォルトに追加） |
+| `include_patterns=[r'...']` | 正規表現パターンに一致するモジュールのみを対象とする |
+| `network_reg_lrs=regex1=lr1,regex2=lr2` | 正規表現パターンでモジュールごとの学習率を設定 |
+| `network_reg_dims=regex1=dim1,regex2=dim2` | 正規表現パターンでモジュールごとの次元（ランク）を設定 |
+
+</details>
+
+### Conv2d support / Conv2dサポート
+
+By default, LoHa and LoKr target Linear and Conv2d 1x1 layers. To also train Conv2d 3x3+ layers (e.g., in SDXL's ResNet blocks), use the `conv_dim` and `conv_alpha` options:
+
+```bash
+--network_args "conv_dim=16" "conv_alpha=8"
+```
+
+For Conv2d 3x3+ layers, you can enable Tucker decomposition for more efficient parameter representation:
+
+```bash
+--network_args "conv_dim=16" "conv_alpha=8" "use_tucker=True"
+```
+
+- Without `use_tucker`: The kernel dimensions are flattened into the input dimension (flat mode).
+- With `use_tucker=True`: A separate Tucker tensor is used to handle the kernel dimensions, which can be more parameter-efficient.
+
+<details>
+<summary>日本語</summary>
+
+デフォルトでは、LoHaとLoKrはLinearおよびConv2d 1x1層を対象とします。Conv2d 3x3+層（SDXLのResNetブロックなど）も学習するには、`conv_dim`と`conv_alpha`オプションを使用します:
+
+```bash
+--network_args "conv_dim=16" "conv_alpha=8"
+```
+
+Conv2d 3x3+層に対して、Tucker分解を有効にすることで、より効率的なパラメータ表現が可能です:
+
+```bash
+--network_args "conv_dim=16" "conv_alpha=8" "use_tucker=True"
+```
+
+- `use_tucker`なし: カーネル次元が入力次元に平坦化されます（flatモード）。
+- `use_tucker=True`: カーネル次元を扱う別のTuckerテンソルが使用され、よりパラメータ効率が良くなる場合があります。
+
+</details>
+
+### LoKr-specific option: `factor` / LoKr固有のオプション: `factor`
+
+LoKr decomposes weight dimensions using factorization. The `factor` option controls how dimensions are split:
+
+- `factor=-1` (default): Automatically find balanced factors. For example, dimension 512 is split into (16, 32).
+- `factor=N` (positive integer): Force factorization using the specified value. For example, `factor=4` splits dimension 512 into (4, 128).
+
+```bash
+--network_args "factor=4"
+```
+
+When `network_dim` (rank) is large enough relative to the factorized dimensions, LoKr uses a full matrix instead of a low-rank decomposition for the second factor. A warning will be logged in this case.
+
+<details>
+<summary>日本語</summary>
+
+LoKrは重みの次元を因数分解して分割します。`factor` オプションでその分割方法を制御します:
+
+- `factor=-1`（デフォルト）: バランスの良い因数を自動的に見つけます。例えば、次元512は(16, 32)に分割されます。
+- `factor=N`（正の整数）: 指定した値で因数分解します。例えば、`factor=4` は次元512を(4, 128)に分割します。
+
+```bash
+--network_args "factor=4"
+```
+
+`network_dim`（ランク）が因数分解された次元に対して十分に大きい場合、LoKrは第2因子に低ランク分解ではなくフル行列を使用します。その場合、警告がログに出力されます。
+
+</details>
+
+### Anima-specific option: `train_llm_adapter` / Anima固有のオプション: `train_llm_adapter`
+
+For Anima, you can additionally train the LLM adapter modules by specifying:
+
+```bash
+--network_args "train_llm_adapter=True"
+```
+
+This includes `LLMAdapterTransformerBlock` modules as training targets.
+
+<details>
+<summary>日本語</summary>
+
+Animaでは、以下を指定することでLLMアダプターモジュールも追加で学習できます:
+
+```bash
+--network_args "train_llm_adapter=True"
+```
+
+これにより、`LLMAdapterTransformerBlock` モジュールが学習対象に含まれます。
+
+</details>
+
+### LoRA+ / LoRA+
+
+LoRA+ (`loraplus_lr_ratio` etc. in `--network_args`) is supported with LoHa/LoKr. For LoHa, the second pair of matrices (`hada_w2_a`) is treated as the "plus" (higher learning rate) parameter group. For LoKr, the scale factor (`lokr_w1`) is treated as the "plus" parameter group.
+
+```bash
+--network_args "loraplus_lr_ratio=4"
+```
+
+This feature has been confirmed to work in basic testing, but feedback is welcome. If you encounter any issues, please report them.
+
+<details>
+<summary>日本語</summary>
+
+LoRA+（`--network_args` の `loraplus_lr_ratio` 等）はLoHa/LoKrでもサポートされています。LoHaでは第2ペアの行列（`hada_w2_a`）が「plus」（より高い学習率）パラメータグループとして扱われます。LoKrではスケール係数（`lokr_w1`）が「plus」パラメータグループとして扱われます。
+
+```bash
+--network_args "loraplus_lr_ratio=4"
+```
+
+この機能は基本的なテストでは動作確認されていますが、フィードバックをお待ちしています。問題が発生した場合はご報告ください。
+
+</details>
+
+## How LoHa and LoKr work / LoHaとLoKrの仕組み
+
+### LoHa
+
+LoHa represents the weight update as a Hadamard (element-wise) product of two low-rank matrices:
+
+```
+ΔW = (W1a × W1b) ⊙ (W2a × W2b)
+```
+
+where `W1a`, `W1b`, `W2a`, `W2b` are low-rank matrices with rank `network_dim`. This means LoHa has roughly **twice the number of trainable parameters** compared to LoRA at the same rank, but can capture more complex weight structures due to the element-wise product.
+
+For Conv2d 3x3+ layers with Tucker decomposition, each pair additionally has a Tucker tensor `T` and the reconstruction becomes: `einsum("i j ..., j r, i p -> p r ...", T, Wb, Wa)`.
+
+### LoKr
+
+LoKr represents the weight update using a Kronecker product:
+
+```
+ΔW = W1 ⊗ W2    (where W2 = W2a × W2b in low-rank mode)
+```
+
+The original weight dimensions are factorized (e.g., a 512×512 weight might be split so that W1 is 16×16 and W2 is 32×32). W1 is always a full matrix (small), while W2 can be either low-rank decomposed or a full matrix depending on the rank setting. LoKr tends to produce **smaller models** compared to LoRA at the same rank.
+
+<details>
+<summary>日本語</summary>
+
+### LoHa
+
+LoHaは重みの更新を2つの低ランク行列のHadamard積（要素ごとの積）で表現します:
+
+```
+ΔW = (W1a × W1b) ⊙ (W2a × W2b)
+```
+
+ここで `W1a`, `W1b`, `W2a`, `W2b` はランク `network_dim` の低ランク行列です。LoHaは同じランクのLoRAと比較して学習可能なパラメータ数が **約2倍** になりますが、要素ごとの積により、より複雑な重み構造を捉えることができます。
+
+Conv2d 3x3+層でTucker分解を使用する場合、各ペアにはさらにTuckerテンソル `T` があり、再構成は `einsum("i j ..., j r, i p -> p r ...", T, Wb, Wa)` となります。
+
+### LoKr
+
+LoKrはKronecker積を使って重みの更新を表現します:
+
+```
+ΔW = W1 ⊗ W2    （低ランクモードでは W2 = W2a × W2b）
+```
+
+元の重みの次元が因数分解されます（例: 512×512の重みが、W1が16×16、W2が32×32に分割されます）。W1は常にフル行列（小さい）で、W2はランク設定に応じて低ランク分解またはフル行列になります。LoKrは同じランクのLoRAと比較して **より小さいモデル** を生成する傾向があります。
+
+</details>
+
+## Inference / 推論
+
+Trained LoHa/LoKr weights are saved in safetensors format, just like LoRA.
+
+<details>
+<summary>日本語</summary>
+
+学習済みのLoHa/LoKrの重みは、LoRAと同様にsafetensors形式で保存されます。
+
+</details>
+
+### SDXL
+
+For SDXL, use `gen_img.py` with `--network_module` and `--network_weights`, the same way as LoRA:
+
+```bash
+python gen_img.py --ckpt path/to/sdxl.safetensors \
+    --network_module networks.loha --network_weights path/to/loha.safetensors \
+    --prompt "your prompt" ...
+```
+
+Replace `networks.loha` with `networks.lokr` for LoKr weights.
+
+<details>
+<summary>日本語</summary>
+
+SDXLでは、LoRAと同様に `gen_img.py` で `--network_module` と `--network_weights` を指定します:
+
+```bash
+python gen_img.py --ckpt path/to/sdxl.safetensors \
+    --network_module networks.loha --network_weights path/to/loha.safetensors \
+    --prompt "your prompt" ...
+```
+
+LoKrの重みを使用する場合は `networks.loha` を `networks.lokr` に置き換えてください。
+
+</details>
+
+### Anima
+
+For Anima, use `anima_minimal_inference.py` with the `--lora_weight` argument. LoRA, LoHa, and LoKr weights are automatically detected and merged:
+
+```bash
+python anima_minimal_inference.py --dit path/to/dit --prompt "your prompt" \
+    --lora_weight path/to/loha_or_lokr.safetensors ...
+```
+
+<details>
+<summary>日本語</summary>
+
+Animaでは、`anima_minimal_inference.py` に `--lora_weight` 引数を指定します。LoRA、LoHa、LoKrの重みは自動的に判定されてマージされます:
+
+```bash
+python anima_minimal_inference.py --dit path/to/dit --prompt "your prompt" \
+    --lora_weight path/to/loha_or_lokr.safetensors ...
+```
+
+</details>