From bee5c3f1b82128fbb0cb3ab03777b5f8be14f092 Mon Sep 17 00:00:00 2001
From: Kohya S <ykumeykume@gmail.com>
Date: Sun, 20 Aug 2023 12:45:56 +0900
Subject: [PATCH] update lllite doc

---
 docs/train_lll_README-ja.md    | 39 ----------------------------
 docs/train_lllite_README-ja.md | 40 +++++++++++++++++++++++++++++
 docs/train_lllite_README.md    | 46 ++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 39 deletions(-)
 delete mode 100644 docs/train_lll_README-ja.md
 create mode 100644 docs/train_lllite_README-ja.md
 create mode 100644 docs/train_lllite_README.md

diff --git a/docs/train_lll_README-ja.md b/docs/train_lll_README-ja.md
deleted file mode 100644
index cd1ded68..00000000
--- a/docs/train_lll_README-ja.md
+++ /dev/null
@@ -1,39 +0,0 @@
-# ConrtolNet-LLLite について
-
-## 概要
-ConrtolNet-LLLite は、[ConrtolNet](https://github.com/lllyasviel/ControlNet) の軽量版です。LoRA Like Lite という意味で、LoRAに似た構造の軽量なControlNetです。現在はSDXLにのみ対応しています。
-
-## モデル構造
-制御用画像（以下conditioning image）を潜在空間に写像するconditioning image embeddingと、U-Netの各モジュールに付与されるLoRAにちょっと似た構造を持つモジュールを組み合わせたモデルです。詳しくはソースコードを参照してください。
-
-## モデルの学習
-
-### データセットの準備
-通常のdatasetに加え、`conditioning_data_dir` で指定したディレクトリにconditioning imageを格納してください。conditioning imageは学習用画像と同じbasenameを持つ必要があります。また、conditioning imageは学習用画像と同じサイズに自動的にリサイズされます。
-
-```toml
-[[datasets.subsets]]
-image_dir = "path/to/image/dir"
-caption_extension = ".txt"
-conditioning_data_dir = "path/to/conditioning/image/dir"
-```
-
-### 学習
-`sdxl_train_control_net_lllite.py` を実行してください。`--cond_emb_dim` でconditioning image embeddingの次元数を指定できます。`--network_dim` でLoRA的モジュールのrankを指定できます。その他のオプションは`sdxl_train_network.py`に準じますが、`--network_module`の指定は不要です。
-
-
-### 推論
-`sdxl_gen_img.py` を実行してください。`--control_net_lllite_models` でLLLiteのモデルファイルを指定できます。次元数はモデルファイルから自動取得します。
-
-`--guide_image_path`で推論に用いるconditioning imageを指定してください。なおpreprocessは行われないため、たとえばCannyならCanny処理を行った画像を指定してください（背景黒に白線）。`--control_net_preps`, `--control_net_weights`, `--control_net_ratios` には未対応です。
-
-### サンプル
-Canny
-![kohya_ss_girl_standing_at_classroom_smiling_to_the_viewer_class_78976b3e-0d4d-4ea0-b8e3-053ae493abbc](https://github.com/kohya-ss/sd-scripts/assets/52813779/7e883352-0fea-4f5a-b820-94e17ec3f3f2)
-
-![im_20230819212806_000_1](https://github.com/kohya-ss/sd-scripts/assets/52813779/c28196f9-b2c3-40ad-b000-21a77e657968)
-
-![im_20230819212815_000_1](https://github.com/kohya-ss/sd-scripts/assets/52813779/b8506354-feb8-4d58-86a8-738a9ba03911)
-
-![im_20230819212822_000_1](https://github.com/kohya-ss/sd-scripts/assets/52813779/1612c221-8df5-420c-b907-75758d89aca7)
-
diff --git a/docs/train_lllite_README-ja.md b/docs/train_lllite_README-ja.md
new file mode 100644
index 00000000..9df4284e
--- /dev/null
+++ b/docs/train_lllite_README-ja.md
@@ -0,0 +1,40 @@
+# ConrtolNet-LLLite について
+
+## 概要
+ConrtolNet-LLLite は、[ConrtolNet](https://github.com/lllyasviel/ControlNet) の軽量版です。LoRA Like Lite という意味で、LoRAからインスピレーションを得た構造を持つ、軽量なControlNetです。現在はSDXLにのみ対応しています。
+
+## モデル構造
+ひとつのLLLiteモジュールは、制御用画像（以下conditioning image）を潜在空間に写像するconditioning image embeddingと、LoRAにちょっと似た構造を持つ小型のネットワークからなります。LLLiteモジュールを、LoRAと同様にU-NetのLinearやConvに追加します。詳しくはソースコードを参照してください。
+
+推論環境の制限で、現在はCrossAttentionのみ（attn1のq/k/v、attn2のq）に追加されます。
+
+## モデルの学習
+
+### データセットの準備
+通常のdatasetに加え、`conditioning_data_dir` で指定したディレクトリにconditioning imageを格納してください。conditioning imageは学習用画像と同じbasenameを持つ必要があります。また、conditioning imageは学習用画像と同じサイズに自動的にリサイズされます。
+
+```toml
+[[datasets.subsets]]
+image_dir = "path/to/image/dir"
+caption_extension = ".txt"
+conditioning_data_dir = "path/to/conditioning/image/dir"
+```
+
+現時点の制約として、random_cropは使用できません。
+
+### 学習
+`sdxl_train_control_net_lllite.py` を実行してください。`--cond_emb_dim` でconditioning image embeddingの次元数を指定できます。`--network_dim` でLoRA的モジュールのrankを指定できます。その他のオプションは`sdxl_train_network.py`に準じますが、`--network_module`の指定は不要です。
+
+conditioning image embeddingの次元数は、サンプルのCannyでは32を指定しています。LoRA的モジュールのrankは同じく64です。対象とするconditioning imageの特徴に合わせて調整してください。
+
+（サンプルのCannyは恐らくかなり難しいと思われます。depthなどでは半分程度にしてもいいかもしれません。）
+
+### 推論
+ComfyUIのカスタムノードを用意しています。: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI
+
+スクリプトで生成する場合は、`sdxl_gen_img.py` を実行してください。`--control_net_lllite_models` でLLLiteのモデルファイルを指定できます。次元数はモデルファイルから自動取得します。
+
+`--guide_image_path`で推論に用いるconditioning imageを指定してください。なおpreprocessは行われないため、たとえばCannyならCanny処理を行った画像を指定してください（背景黒に白線）。`--control_net_preps`, `--control_net_weights`, `--control_net_ratios` には未対応です。
+
+## サンプル
+Canny
diff --git a/docs/train_lllite_README.md b/docs/train_lllite_README.md
new file mode 100644
index 00000000..ab8fbd62
--- /dev/null
+++ b/docs/train_lllite_README.md
@@ -0,0 +1,46 @@
+# About ConrtolNet-LLLite
+
+## Overview
+
+ConrtolNet-LLLite is a lightweight version of [ConrtolNet](https://github.com/lllyasviel/ControlNet). It is a "LoRA Like Lite" that is inspired by LoRA and has a lightweight structure. Currently, only SDXL is supported.
+
+## Model structure
+
+A single LLLite module consists of a conditioning image embedding that maps a conditioning image to a latent space and a small network with a structure similar to LoRA. The LLLite module is added to U-Net's Linear and Conv in the same way as LoRA. Please refer to the source code for details.
+
+Due to the limitations of the inference environment, only CrossAttention (attn1 q/k/v, attn2 q) is currently added.
+
+## Model training
+
+### Preparing the dataset
+
+In addition to the normal dataset, please store the conditioning image in the directory specified by `conditioning_data_dir`. The conditioning image must have the same basename as the training image. The conditioning image will be automatically resized to the same size as the training image.
+
+```toml
+[[datasets.subsets]]
+image_dir = "path/to/image/dir"
+caption_extension = ".txt"
+conditioning_data_dir = "path/to/conditioning/image/dir"
+```
+
+At the moment, random_crop cannot be used.
+
+### Training
+
+Run `sdxl_train_control_net_lllite.py`. You can specify the dimension of the conditioning image embedding with `--cond_emb_dim`. You can specify the rank of the LoRA-like module with `--network_dim`. Other options are the same as `sdxl_train_network.py`, but `--network_module` is not required.
+
+For the sample Canny, the dimension of the conditioning image embedding is 32. The rank of the LoRA-like module is also 64. Adjust according to the features of the conditioning image you are targeting.
+
+(The sample Canny is probably quite difficult. It may be better to reduce it to about half for depth, etc.)
+
+### Inference
+
+A custom node for ComfyUI is available: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI
+
+If you want to generate images with a script, run `sdxl_gen_img.py`. You can specify the LLLite model file with `--control_net_lllite_models`. The dimension is automatically obtained from the model file.
+
+Specify the conditioning image to be used for inference with `--guide_image_path`. Since preprocess is not performed, if it is Canny, specify an image processed with Canny (white line on black background). `--control_net_preps`, `--control_net_weights`, and `--control_net_ratios` are not supported.
+
+## Sample
+
+Canny