Merge branch 'dev' into deep-speed

2026-04-09 06:45:09 +00:00 · 2024-03-24 18:45:59 +09:00
parent a35e7bd595 381c44955e
commit 993b2ab4c1
6 changed files with 394 additions and 200 deletions
--- a/README.md
+++ b/README.md
@@ -254,121 +254,41 @@ ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [docum
 - Colab seems to stop with log output. Try specifying `--console_log_simple` option in the training script to disable rich logging.
 - The `.toml` file for the dataset config is now read in UTF-8 encoding. PR [#1167](https://github.com/kohya-ss/sd-scripts/pull/1167) Thanks to Horizon1704!
 - Fixed a bug that the last subset settings are applied to all images when multiple subsets of regularization images are specified in the dataset settings. The settings for each subset are correctly applied to each image. PR [#1205](https://github.com/kohya-ss/sd-scripts/pull/1205) Thanks to feffy380!
 - `train_network.py` and `sdxl_train_network.py` are modified to record some dataset settings in the metadata of the trained model (`caption_prefix`, `caption_suffix`, `keep_tokens_separator`, `secondary_separator`, `enable_wildcard`).
 - Some features are added to the dataset subset settings.
  - `secondary_separator` is added to specify the tag separator that is not the target of shuffling or dropping. 
-    - Specify `secondary_separator=";;;"`. When you specify `secondary_separator`, the part is not shuffled or dropped. See the example below.
+    - Specify `secondary_separator=";;;"`. When you specify `secondary_separator`, the part is not shuffled or dropped. 
-  - `enable_wildcard` is added. When set to `true`, the wildcard notation `{aaa|bbb|ccc}` can be used. See the example below.
+  - `enable_wildcard` is added. When set to `true`, the wildcard notation `{aaa|bbb|ccc}` can be used. The multi-line caption is also enabled.
  - `keep_tokens_separator` is updated to be used twice in the caption. When you specify `keep_tokens_separator="|||"`, the part divided by the second `|||` is not shuffled or dropped and remains at the end.
  - The existing features `caption_prefix` and `caption_suffix` can be used together. `caption_prefix` and `caption_suffix` are processed first, and then `enable_wildcard`, `keep_tokens_separator`, shuffling and dropping, and `secondary_separator` are processed in order.
-  - The examples are [shown below](#example-of-dataset-settings--データセット設定の記述例).
+  - See [Dataset config](./docs/config_README-en.md) for details.
 - The support for v3 repositories is added to `tag_image_by_wd14_tagger.py` (`--onnx` option only). PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) Thanks to sdbds!
  - Onnx may need to be updated. Onnx is not installed by default, so please install or update it with `pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` etc. Please also check the comments in `requirements.txt`.
 - The model is now saved in the subdirectory as `--repo_id` in `tag_image_by_wd14_tagger.py` . This caches multiple repo_id models. Please delete unnecessary files under `--model_dir`.
 - The options `--noise_offset_random_strength` and `--ip_noise_gamma_random_strength` are added to each training script. These options can be used to vary the noise offset and ip noise gamma in the range of 0 to the specified value. PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) Thanks to KohakuBlueleaf!
 - The [English version of the dataset settings documentation](./docs/config_README-en.md) is added. PR [#1175](https://github.com/kohya-ss/sd-scripts/pull/1175) Thanks to darkstorm2150!
 - The options `--save_state_on_train_end` are added to each training script. PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) Thanks to gesen2egee!
 - The options `--sample_every_n_epochs` and `--sample_every_n_steps` in each training script now display a warning and ignore them when a number less than or equal to `0` is specified. Thanks to S-Del for raising the issue.
 - The [English version of the dataset settings documentation](./docs/config_README-en.md) is added. PR [#1175](https://github.com/kohya-ss/sd-scripts/pull/1175) Thanks to darkstorm2150!
 - Colab での動作時、ログ出力で停止してしまうようです。学習スクリプトに `--console_log_simple` オプションを指定し、rich のロギングを無効してお試しください。
 - データセット設定の `.toml` ファイルが UTF-8 encoding で読み込まれるようになりました。PR [#1167](https://github.com/kohya-ss/sd-scripts/pull/1167) Horizon1704 氏に感謝します。
 - データセット設定で、正則化画像のサブセットを複数指定した時、最後のサブセットの各種設定がすべてのサブセットの画像に適用される不具合が修正されました。それぞれのサブセットの設定が、それぞれの画像に正しく適用されます。PR [#1205](https://github.com/kohya-ss/sd-scripts/pull/1205) feffy380 氏に感謝します。
 - `train_network.py` および `sdxl_train_network.py` で、学習したモデルのメタデータに一部のデータセット設定が記録されるよう修正しました（`caption_prefix`、`caption_suffix`、`keep_tokens_separator`、`secondary_separator`、`enable_wildcard`）。
 - データセットのサブセット設定にいくつかの機能を追加しました。
-  - シャッフルの対象とならないタグ分割識別子の指定 `secondary_separator` を追加しました。`secondary_separator=";;;"` のように指定します。`secondary_separator` で区切ることで、その部分はシャッフル、drop 時にまとめて扱われます。詳しくは記述例をご覧ください。
+  - シャッフルの対象とならないタグ分割識別子の指定 `secondary_separator` を追加しました。`secondary_separator=";;;"` のように指定します。`secondary_separator` で区切ることで、その部分はシャッフル、drop 時にまとめて扱われます。
-  - `enable_wildcard` を追加しました。`true` にするとワイルドカード記法 `{aaa|bbb|ccc}` が使えます。詳しくは記述例をご覧ください。
+  - `enable_wildcard` を追加しました。`true` にするとワイルドカード記法 `{aaa|bbb|ccc}` が使えます。また複数行キャプションも有効になります。
  - `keep_tokens_separator` をキャプション内に 2 つ使えるようにしました。たとえば `keep_tokens_separator="|||"` と指定したとき、`1girl, hatsune miku, vocaloid ||| stage, mic ||| best quality, rating: general` とキャプションを指定すると、二番目の `|||` で分割された部分はシャッフル、drop されず末尾に残ります。
  - 既存の機能 `caption_prefix` と `caption_suffix` とあわせて使えます。`caption_prefix` と `caption_suffix` は一番最初に処理され、その後、ワイルドカード、`keep_tokens_separator`、シャッフルおよび drop、`secondary_separator` の順に処理されます。
  - 詳細は [データセット設定](./docs/config_README-ja.md) をご覧ください。
 - `tag_image_by_wd14_tagger.py` で v3 のリポジトリがサポートされました（`--onnx` 指定時のみ有効）。 PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) sdbds 氏に感謝します。
  - Onnx のバージョンアップが必要になるかもしれません。デフォルトでは Onnx はインストールされていませんので、`pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` 等でインストール、アップデートしてください。`requirements.txt` のコメントもあわせてご確認ください。
 - `tag_image_by_wd14_tagger.py` で、モデルを`--repo_id` のサブディレクトリに保存するようにしました。これにより複数のモデルファイルがキャッシュされます。`--model_dir` 直下の不要なファイルは削除願います。
 - 各学習スクリプトに、noise offset、ip noise gammaを、それぞれ 0~指定した値の範囲で変動させるオプション `--noise_offset_random_strength` および `--ip_noise_gamma_random_strength` が追加されました。 PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) KohakuBlueleaf 氏に感謝します。
 - データセット設定の[英語版ドキュメント](./docs/config_README-en.md) が追加されました。PR [#1175](https://github.com/kohya-ss/sd-scripts/pull/1175) darkstorm2150 氏に感謝します。
 - 各学習スクリプトに、学習終了時に state を保存する `--save_state_on_train_end` オプションが追加されました。 PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) gesen2egee 氏に感謝します。
-
+- 各学習スクリプトで `--sample_every_n_epochs` および `--sample_every_n_steps` オプションに `0` 以下の数値を指定した時、警告を表示するとともにそれらを無視するよう変更しました。問題提起していただいた S-Del 氏に感謝します。
-
+- データセット設定の[英語版ドキュメント](./docs/config_README-en.md) が追加されました。PR [#1175](https://github.com/kohya-ss/sd-scripts/pull/1175) darkstorm2150 氏に感謝します。
 #### Example of dataset settings / データセット設定の記述例:
 ```toml
 [general]
 flip_aug = true
 color_aug = false
 resolution = [1024, 1024]
 [[datasets]]
 batch_size = 6
 enable_bucket = true
 bucket_no_upscale = true
 caption_extension = ".txt"
 keep_tokens_separator= "|||"
 shuffle_caption = true
 caption_tag_dropout_rate = 0.1
 secondary_separator = ";;;" # subset 側に書くこともできます / can be written in the subset side
 enable_wildcard = true # 同上 / same as above
  [[datasets.subsets]]
  image_dir = "/path/to/image_dir"
  num_repeats = 1
  # ||| の前後はカンマは不要です（自動的に追加されます） / No comma is required before and after ||| (it is added automatically)
  caption_prefix = "1girl, hatsune miku, vocaloid |||" 
  # ||| の後はシャッフル、drop されず残ります / After |||, it is not shuffled or dropped and remains
  # 単純に文字列として連結されるので、カンマなどは自分で入れる必要があります / It is simply concatenated as a string, so you need to put commas yourself
  caption_suffix = ", anime screencap ||| masterpiece, rating: general"
 ```
 #### Example of caption, secondary_separator notation: `secondary_separator = ";;;"`
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, sky;;;cloud;;;day, outdoors
 ```
 The part `sky;;;cloud;;;day` is replaced with `sky,cloud,day` without shuffling or dropping. When shuffling and dropping are enabled, it is processed as a whole (as one tag). For example, it becomes `vocaloid, 1girl, upper body, sky,cloud,day, outdoors, hatsune miku` (shuffled) or `vocaloid, 1girl, outdoors, looking at viewer, upper body, hatsune miku` (dropped).
 #### Example of caption, enable_wildcard notation: `enable_wildcard = true`
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, {simple|white} background
 ```
 `simple` or `white` is randomly selected, and it becomes `simple background` or `white background`.
 ```txt
 1girl, hatsune miku, vocaloid, {{retro style}}
 ```
 If you want to include `{` or `}` in the tag string, double them like `{{` or `}}` (in this example, the actual caption used for training is `{retro style}`).
 #### Example of caption, `keep_tokens_separator` notation: `keep_tokens_separator = "|||"`
 ```txt
 1girl, hatsune miku, vocaloid ||| stage, microphone, white shirt, smile ||| best quality, rating: general
 ```
 It becomes `1girl, hatsune miku, vocaloid, microphone, stage, white shirt, best quality, rating: general` or `1girl, hatsune miku, vocaloid, white shirt, smile, stage, microphone, best quality, rating: general` etc.
 #### キャプション記述例、secondary_separator 記法：`secondary_separator = ";;;"` の場合
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, sky;;;cloud;;;day, outdoors
 ```
 `sky;;;cloud;;;day` の部分はシャッフル、drop されず `sky,cloud,day` に置換されます。シャッフル、drop が有効な場合、まとめて（一つのタグとして）処理されます。つまり `vocaloid, 1girl, upper body, sky,cloud,day, outdoors, hatsune miku` （シャッフル）や `vocaloid, 1girl, outdoors, looking at viewer, upper body, hatsune miku` （drop されたケース）などになります。
 #### キャプション記述例、ワイルドカード記法： `enable_wildcard = true` の場合
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, {simple|white} background
 ```
 ランダムに `simple` または `white` が選ばれ、`simple background` または `white background` になります。
 ```txt
 1girl, hatsune miku, vocaloid, {{retro style}}
 ```
 タグ文字列に `{` や `}` そのものを含めたい場合は `{{` や `}}` のように二つ重ねてください（この例では実際に学習に用いられるキャプションは `{retro style}` になります）。
 #### キャプション記述例、`keep_tokens_separator` 記法： `keep_tokens_separator = "|||"` の場合
 ```txt
 1girl, hatsune miku, vocaloid ||| stage, microphone, white shirt, smile ||| best quality, rating: general
 ```
 `1girl, hatsune miku, vocaloid, microphone, stage, white shirt, best quality, rating: general` や `1girl, hatsune miku, vocaloid, white shirt, smile, stage, microphone, best quality, rating: general` などになります。
 ### Mar 15, 2024 / 2024/3/15: v0.8.5
--- a/docs/config_README-en.md
+++ b/docs/config_README-en.md
@@ -1,7 +1,10 @@
 Original Source by kohya-ss
 First version:
 A.I Translation by Model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO, editing by Darkstorm2150
 Some parts are manually added.
 # Config Readme
 This README is about the configuration files that can be passed with the `--dataset_config` option.
@@ -143,11 +146,23 @@ These options are related to subset configuration.
 | `shuffle_caption` | `true` | o | o | o |
 | `caption_prefix` | `"masterpiece, best quality, "` | o | o | o |
 | `caption_suffix` | `", from side"` | o | o | o |
 | `caption_separator` |  (not specified) | o | o | o |
 | `keep_tokens_separator` | `“|||”` | o | o | o |
 | `secondary_separator` | `“;;;”` | o | o | o |
 | `enable_wildcard` | `true` | o | o | o |
 * `num_repeats`
    * Specifies the number of repeats for images in a subset. This is equivalent to `--dataset_repeats` in fine-tuning but can be specified for any training method.
 * `caption_prefix`, `caption_suffix`
    * Specifies the prefix and suffix strings to be appended to the captions. Shuffling is performed with these strings included. Be cautious when using `keep_tokens`.
 * `caption_separator`
    * Specifies the string to separate the tags. The default is `,`. This option is usually not necessary to set.
 * `keep_tokens_separator`
    * Specifies the string to separate the parts to be fixed in the caption. For example, if you specify `aaa, bbb ||| ccc, ddd, eee, fff ||| ggg, hhh`, the parts `aaa, bbb` and `ggg, hhh` will remain, and the rest will be shuffled and dropped. The comma in between is not necessary. As a result, the prompt will be `aaa, bbb, eee, ccc, fff, ggg, hhh` or `aaa, bbb, fff, ccc, eee, ggg, hhh`, etc.
 * `secondary_separator`
    * Specifies an additional separator. The part separated by this separator is treated as one tag and is shuffled and dropped. It is then replaced by `caption_separator`. For example, if you specify `aaa;;;bbb;;;ccc`, it will be replaced by `aaa,bbb,ccc` or dropped together.
 * `enable_wildcard`
    * Enables wildcard notation. This will be explained later.
 ### DreamBooth-specific options
@@ -276,4 +291,90 @@ As a temporary measure, we will list common errors and their solutions. If you e
 * `voluptuous.error.MultipleInvalid: expected int for dictionary value @ ...`: This error occurs when the specified value format is incorrect. It is highly likely that the value format is incorrect. The `int` part changes depending on the target option. The example configurations in this README may be helpful.
 * `voluptuous.error.MultipleInvalid: extra keys not allowed @ ...`: This error occurs when there is an option name that is not supported. It is highly likely that you misspelled the option name or mistakenly included it.
 ## Miscellaneous
 ### Multi-line captions
 By setting `enable_wildcard = true`, multiple-line captions are also enabled. If the caption file consists of multiple lines, one line is randomly selected as the caption. 
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, microphone, stage
 a girl with a microphone standing on a stage
 detailed digital art of a girl with a microphone on a stage
 ```
 It can be combined with wildcard notation.
 In metadata files, you can also specify multiple-line captions. In the `.json` metadata file, use `\n` to represent a line break. If the caption file consists of multiple lines, `merge_captions_to_metadata.py` will create a metadata file in this format.
 The tags in the metadata (`tags`) are added to each line of the caption.
 ```json
 {
    "/path/to/image.png": {
        "caption": "a cartoon of a frog with the word frog on it\ntest multiline caption1\ntest multiline caption2",
        "tags": "open mouth, simple background, standing, no humans, animal, black background, frog, animal costume, animal focus"
    },
    ...
 }
 ```
 In this case, the actual caption will be `a cartoon of a frog with the word frog on it, open mouth, simple background ...`, `test multiline caption1, open mouth, simple background ...`, `test multiline caption2, open mouth, simple background ...`, etc.
 ### Example of configuration file : `secondary_separator`, wildcard notation, `keep_tokens_separator`, etc.
 ```toml
 [general]
 flip_aug = true
 color_aug = false
 resolution = [1024, 1024]
 [[datasets]]
 batch_size = 6
 enable_bucket = true
 bucket_no_upscale = true
 caption_extension = ".txt"
 keep_tokens_separator= "|||"
 shuffle_caption = true
 caption_tag_dropout_rate = 0.1
 secondary_separator = ";;;" # subset 側に書くこともできます / can be written in the subset side
 enable_wildcard = true # 同上 / same as above
  [[datasets.subsets]]
  image_dir = "/path/to/image_dir"
  num_repeats = 1
  # ||| の前後はカンマは不要です（自動的に追加されます） / No comma is required before and after ||| (it is added automatically)
  caption_prefix = "1girl, hatsune miku, vocaloid |||" 
  # ||| の後はシャッフル、drop されず残ります / After |||, it is not shuffled or dropped and remains
  # 単純に文字列として連結されるので、カンマなどは自分で入れる必要があります / It is simply concatenated as a string, so you need to put commas yourself
  caption_suffix = ", anime screencap ||| masterpiece, rating: general"
 ```
 ### Example of caption, secondary_separator notation: `secondary_separator = ";;;"`
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, sky;;;cloud;;;day, outdoors
 ```
 The part `sky;;;cloud;;;day` is replaced with `sky,cloud,day` without shuffling or dropping. When shuffling and dropping are enabled, it is processed as a whole (as one tag). For example, it becomes `vocaloid, 1girl, upper body, sky,cloud,day, outdoors, hatsune miku` (shuffled) or `vocaloid, 1girl, outdoors, looking at viewer, upper body, hatsune miku` (dropped).
 ### Example of caption, enable_wildcard notation: `enable_wildcard = true`
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, {simple|white} background
 ```
 `simple` or `white` is randomly selected, and it becomes `simple background` or `white background`.
 ```txt
 1girl, hatsune miku, vocaloid, {{retro style}}
 ```
 If you want to include `{` or `}` in the tag string, double them like `{{` or `}}` (in this example, the actual caption used for training is `{retro style}`).
 ### Example of caption, `keep_tokens_separator` notation: `keep_tokens_separator = "|||"`
 ```txt
 1girl, hatsune miku, vocaloid ||| stage, microphone, white shirt, smile ||| best quality, rating: general
 ```
 It becomes `1girl, hatsune miku, vocaloid, microphone, stage, white shirt, best quality, rating: general` or `1girl, hatsune miku, vocaloid, white shirt, smile, stage, microphone, best quality, rating: general` etc.
--- a/docs/config_README-ja.md
+++ b/docs/config_README-ja.md
@@ -1,5 +1,3 @@
 For non-Japanese speakers: this README is provided only in Japanese in the current state. Sorry for inconvenience. We will provide English version in the near future.
 `--dataset_config` で渡すことができる設定ファイルに関する説明です。
 ## 概要
@@ -140,12 +138,28 @@ DreamBooth の手法と fine tuning の手法の両方とも利用可能な学
 | `shuffle_caption` | `true` | o | o | o |
 | `caption_prefix` | `“masterpiece, best quality, ”` | o | o | o |
 | `caption_suffix` | `“, from side”` | o | o | o |
 | `caption_separator` | （通常は設定しません） | o | o | o |
 | `keep_tokens_separator` | `“|||”` | o | o | o |
 | `secondary_separator` | `“;;;”` | o | o | o |
 | `enable_wildcard` | `true` | o | o | o |
 * `num_repeats`
    * サブセットの画像の繰り返し回数を指定します。fine tuning における `--dataset_repeats` に相当しますが、`num_repeats` はどの学習方法でも指定可能です。
 * `caption_prefix`, `caption_suffix`
    * キャプションの前、後に付与する文字列を指定します。シャッフルはこれらの文字列を含めた状態で行われます。`keep_tokens` を指定する場合には注意してください。
 * `caption_separator`
    * タグを区切る文字列を指定します。デフォルトは `,` です。このオプションは通常は設定する必要はありません。
 * `keep_tokens_separator`
    *  キャプションで固定したい部分を区切る文字列を指定します。たとえば `aaa, bbb ||| ccc, ddd, eee, fff ||| ggg, hhh` のように指定すると、`aaa, bbb` と `ggg, hhh` の部分はシャッフル、drop されず残ります。間のカンマは不要です。結果としてプロンプトは `aaa, bbb, eee, ccc, fff, ggg, hhh` や `aaa, bbb, fff, ccc, eee, ggg, hhh` などになります。
 * `secondary_separator`
    * 追加の区切り文字を指定します。この区切り文字で区切られた部分は一つのタグとして扱われ、シャッフル、drop されます。その後、`caption_separator` に置き換えられます。たとえば `aaa;;;bbb;;;ccc` のように指定すると、`aaa,bbb,ccc` に置き換えられるか、まとめて drop されます。
 * `enable_wildcard`
    * ワイルドカード記法および複数行キャプションを有効にします。ワイルドカード記法、複数行キャプションについては後述します。
 ### DreamBooth 方式専用のオプション
 DreamBooth 方式のオプションは、サブセット向けオプションのみ存在します。
@@ -280,4 +294,89 @@ resolution = 768
 * `voluptuous.error.MultipleInvalid: expected int for dictionary value @ ...`: 指定する値の形式が不正というエラーです。値の形式が間違っている可能性が高いです。`int` の部分は対象となるオプションによって変わります。この README に載っているオプションの「設定例」が役立つかもしれません。
 * `voluptuous.error.MultipleInvalid: extra keys not allowed @ ...`: 対応していないオプション名が存在している場合に発生するエラーです。オプション名を間違って記述しているか、誤って紛れ込んでいる可能性が高いです。
 ## その他
 ### 複数行キャプション
 `enable_wildcard = true` を設定することで、複数行キャプションも同時に有効になります。キャプションファイルが複数の行からなる場合、ランダムに一つの行が選ばれてキャプションとして利用されます。
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, microphone, stage
 a girl with a microphone standing on a stage
 detailed digital art of a girl with a microphone on a stage
 ```
 ワイルドカード記法と組み合わせることも可能です。
 メタデータファイルでも同様に複数行キャプションを指定することができます。メタデータの .json 内には、`\n` を使って改行を表現してください。キャプションファイルが複数行からなる場合、`merge_captions_to_metadata.py` を使うと、この形式でメタデータファイルが作成されます。
 メタデータのタグ (`tags`) は、キャプションの各行に追加されます。
 ```json
 {
    "/path/to/image.png": {
        "caption": "a cartoon of a frog with the word frog on it\ntest multiline caption1\ntest multiline caption2",
        "tags": "open mouth, simple background, standing, no humans, animal, black background, frog, animal costume, animal focus"
    },
    ...
 }
 ```
 この場合、実際のキャプションは `a cartoon of a frog with the word frog on it, open mouth, simple background ...` または `test multiline caption1, open mouth, simple background ...`、 `test multiline caption2, open mouth, simple background ...` 等になります。
 ### 設定ファイルの記述例：追加の区切り文字、ワイルドカード記法、`keep_tokens_separator` 等
 ```toml
 [general]
 flip_aug = true
 color_aug = false
 resolution = [1024, 1024]
 [[datasets]]
 batch_size = 6
 enable_bucket = true
 bucket_no_upscale = true
 caption_extension = ".txt"
 keep_tokens_separator= "|||"
 shuffle_caption = true
 caption_tag_dropout_rate = 0.1
 secondary_separator = ";;;" # subset 側に書くこともできます / can be written in the subset side
 enable_wildcard = true # 同上 / same as above
  [[datasets.subsets]]
  image_dir = "/path/to/image_dir"
  num_repeats = 1
  # ||| の前後はカンマは不要です（自動的に追加されます） / No comma is required before and after ||| (it is added automatically)
  caption_prefix = "1girl, hatsune miku, vocaloid |||" 
  # ||| の後はシャッフル、drop されず残ります / After |||, it is not shuffled or dropped and remains
  # 単純に文字列として連結されるので、カンマなどは自分で入れる必要があります / It is simply concatenated as a string, so you need to put commas yourself
  caption_suffix = ", anime screencap ||| masterpiece, rating: general"
 ```
 ### キャプション記述例、secondary_separator 記法：`secondary_separator = ";;;"` の場合
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, sky;;;cloud;;;day, outdoors
 ```
 `sky;;;cloud;;;day` の部分はシャッフル、drop されず `sky,cloud,day` に置換されます。シャッフル、drop が有効な場合、まとめて（一つのタグとして）処理されます。つまり `vocaloid, 1girl, upper body, sky,cloud,day, outdoors, hatsune miku` （シャッフル）や `vocaloid, 1girl, outdoors, looking at viewer, upper body, hatsune miku` （drop されたケース）などになります。
 ### キャプション記述例、ワイルドカード記法： `enable_wildcard = true` の場合
 ```txt
 1girl, hatsune miku, vocaloid, upper body, looking at viewer, {simple|white} background
 ```
 ランダムに `simple` または `white` が選ばれ、`simple background` または `white background` になります。
 ```txt
 1girl, hatsune miku, vocaloid, {{retro style}}
 ```
 タグ文字列に `{` や `}` そのものを含めたい場合は `{{` や `}}` のように二つ重ねてください（この例では実際に学習に用いられるキャプションは `{retro style}` になります）。
 ### キャプション記述例、`keep_tokens_separator` 記法： `keep_tokens_separator = "|||"` の場合
 ```txt
 1girl, hatsune miku, vocaloid ||| stage, microphone, white shirt, smile ||| best quality, rating: general
 ```
 `1girl, hatsune miku, vocaloid, microphone, stage, white shirt, best quality, rating: general` や `1girl, hatsune miku, vocaloid, white shirt, smile, stage, microphone, best quality, rating: general` などになります。
--- a/finetune/merge_captions_to_metadata.py
+++ b/finetune/merge_captions_to_metadata.py
@@ -6,75 +6,95 @@ from tqdm import tqdm
 import library.train_util as train_util
 import os
 from library.utils import setup_logging
 setup_logging()
 import logging
 logger = logging.getLogger(__name__)
 def main(args):
-  assert not args.recursive or (args.recursive and args.full_path), "recursive requires full_path / recursiveはfull_pathと同時に指定してください"
+    assert not args.recursive or (
        args.recursive and args.full_path
    ), "recursive requires full_path / recursiveはfull_pathと同時に指定してください"
-  train_data_dir_path = Path(args.train_data_dir)
+    train_data_dir_path = Path(args.train_data_dir)
-  image_paths: List[Path] = train_util.glob_images_pathlib(train_data_dir_path, args.recursive)
+    image_paths: List[Path] = train_util.glob_images_pathlib(train_data_dir_path, args.recursive)
-  logger.info(f"found {len(image_paths)} images.")
+    logger.info(f"found {len(image_paths)} images.")
-  if args.in_json is None and Path(args.out_json).is_file():
+    if args.in_json is None and Path(args.out_json).is_file():
-    args.in_json = args.out_json
+        args.in_json = args.out_json
-  if args.in_json is not None:
+    if args.in_json is not None:
-    logger.info(f"loading existing metadata: {args.in_json}")
+        logger.info(f"loading existing metadata: {args.in_json}")
-    metadata = json.loads(Path(args.in_json).read_text(encoding='utf-8'))
+        metadata = json.loads(Path(args.in_json).read_text(encoding="utf-8"))
-    logger.warning("captions for existing images will be overwritten / 既存の画像のキャプションは上書きされます")
+        logger.warning("captions for existing images will be overwritten / 既存の画像のキャプションは上書きされます")
-  else:
+    else:
-    logger.info("new metadata will be created / 新しいメタデータファイルが作成されます")
+        logger.info("new metadata will be created / 新しいメタデータファイルが作成されます")
-    metadata = {}
+        metadata = {}
-  logger.info("merge caption texts to metadata json.")
+    logger.info("merge caption texts to metadata json.")
-  for image_path in tqdm(image_paths):
+    for image_path in tqdm(image_paths):
-    caption_path = image_path.with_suffix(args.caption_extension)
+        caption_path = image_path.with_suffix(args.caption_extension)
-    caption = caption_path.read_text(encoding='utf-8').strip()
+        caption = caption_path.read_text(encoding="utf-8").strip()
-    if not os.path.exists(caption_path):
+        if not os.path.exists(caption_path):
-      caption_path = os.path.join(image_path, args.caption_extension)
+            caption_path = os.path.join(image_path, args.caption_extension)
-    image_key = str(image_path) if args.full_path else image_path.stem
+        image_key = str(image_path) if args.full_path else image_path.stem
-    if image_key not in metadata:
+        if image_key not in metadata:
-      metadata[image_key] = {}
+            metadata[image_key] = {}
-    metadata[image_key]['caption'] = caption
+        metadata[image_key]["caption"] = caption
-    if args.debug:
+        if args.debug:
-      logger.info(f"{image_key} {caption}")
+            logger.info(f"{image_key} {caption}")
-  # metadataを書き出して終わり
+    # metadataを書き出して終わり
-  logger.info(f"writing metadata: {args.out_json}")
+    logger.info(f"writing metadata: {args.out_json}")
-  Path(args.out_json).write_text(json.dumps(metadata, indent=2), encoding='utf-8')
+    Path(args.out_json).write_text(json.dumps(metadata, indent=2), encoding="utf-8")
-  logger.info("done!")
+    logger.info("done!")
 def setup_parser() -> argparse.ArgumentParser:
-  parser = argparse.ArgumentParser()
+    parser = argparse.ArgumentParser()
-  parser.add_argument("train_data_dir", type=str, help="directory for train images / 学習画像データのディレクトリ")
+    parser.add_argument("train_data_dir", type=str, help="directory for train images / 学習画像データのディレクトリ")
-  parser.add_argument("out_json", type=str, help="metadata file to output / メタデータファイル書き出し先")
+    parser.add_argument("out_json", type=str, help="metadata file to output / メタデータファイル書き出し先")
-  parser.add_argument("--in_json", type=str,
+    parser.add_argument(
-                      help="metadata file to input (if omitted and out_json exists, existing out_json is read) / 読み込むメタデータファイル（省略時、out_jsonが存在すればそれを読み込む）")
+        "--in_json",
-  parser.add_argument("--caption_extention", type=str, default=None,
+        type=str,
-                      help="extension of caption file (for backward compatibility) / 読み込むキャプションファイルの拡張子（スペルミスしていたのを残してあります）")
+        help="metadata file to input (if omitted and out_json exists, existing out_json is read) / 読み込むメタデータファイル（省略時、out_jsonが存在すればそれを読み込む）",
-  parser.add_argument("--caption_extension", type=str, default=".caption", help="extension of caption file / 読み込むキャプションファイルの拡張子")
+    )
-  parser.add_argument("--full_path", action="store_true",
+    parser.add_argument(
-                      help="use full path as image-key in metadata (supports multiple directories) / メタデータで画像キーをフルパスにする（複数の学習画像ディレクトリに対応）")
+        "--caption_extention",
-  parser.add_argument("--recursive", action="store_true",
+        type=str,
-                      help="recursively look for training tags in all child folders of train_data_dir / train_data_dirのすべての子フォルダにある学習タグを再帰的に探す")
+        default=None,
-  parser.add_argument("--debug", action="store_true", help="debug mode")
+        help="extension of caption file (for backward compatibility) / 読み込むキャプションファイルの拡張子（スペルミスしていたのを残してあります）",
    )
    parser.add_argument(
        "--caption_extension", type=str, default=".caption", help="extension of caption file / 読み込むキャプションファイルの拡張子"
    )
    parser.add_argument(
        "--full_path",
        action="store_true",
        help="use full path as image-key in metadata (supports multiple directories) / メタデータで画像キーをフルパスにする（複数の学習画像ディレクトリに対応）",
    )
    parser.add_argument(
        "--recursive",
        action="store_true",
        help="recursively look for training tags in all child folders of train_data_dir / train_data_dirのすべての子フォルダにある学習タグを再帰的に探す",
    )
    parser.add_argument("--debug", action="store_true", help="debug mode")
-  return parser
+    return parser
-if __name__ == '__main__':
+if __name__ == "__main__":
-  parser = setup_parser()
+    parser = setup_parser()
-  args = parser.parse_args()
+    args = parser.parse_args()
-  # スペルミスしていたオプションを復元する
+    # スペルミスしていたオプションを復元する
-  if args.caption_extention is not None:
+    if args.caption_extention is not None:
-    args.caption_extension = args.caption_extention
+        args.caption_extension = args.caption_extention
-  main(args)
+    main(args)
--- a/finetune/merge_dd_tags_to_metadata.py
+++ b/finetune/merge_dd_tags_to_metadata.py
@@ -6,70 +6,88 @@ from tqdm import tqdm
 import library.train_util as train_util
 import os
 from library.utils import setup_logging
 setup_logging()
 import logging
 logger = logging.getLogger(__name__)
 def main(args):
-  assert not args.recursive or (args.recursive and args.full_path), "recursive requires full_path / recursiveはfull_pathと同時に指定してください"
+    assert not args.recursive or (
        args.recursive and args.full_path
    ), "recursive requires full_path / recursiveはfull_pathと同時に指定してください"
-  train_data_dir_path = Path(args.train_data_dir)
+    train_data_dir_path = Path(args.train_data_dir)
-  image_paths: List[Path] = train_util.glob_images_pathlib(train_data_dir_path, args.recursive)
+    image_paths: List[Path] = train_util.glob_images_pathlib(train_data_dir_path, args.recursive)
-  logger.info(f"found {len(image_paths)} images.")
+    logger.info(f"found {len(image_paths)} images.")
-  if args.in_json is None and Path(args.out_json).is_file():
+    if args.in_json is None and Path(args.out_json).is_file():
-    args.in_json = args.out_json
+        args.in_json = args.out_json
-  if args.in_json is not None:
+    if args.in_json is not None:
-    logger.info(f"loading existing metadata: {args.in_json}")
+        logger.info(f"loading existing metadata: {args.in_json}")
-    metadata = json.loads(Path(args.in_json).read_text(encoding='utf-8'))
+        metadata = json.loads(Path(args.in_json).read_text(encoding="utf-8"))
-    logger.warning("tags data for existing images will be overwritten / 既存の画像のタグは上書きされます")
+        logger.warning("tags data for existing images will be overwritten / 既存の画像のタグは上書きされます")
-  else:
+    else:
-    logger.info("new metadata will be created / 新しいメタデータファイルが作成されます")
+        logger.info("new metadata will be created / 新しいメタデータファイルが作成されます")
-    metadata = {}
+        metadata = {}
-  logger.info("merge tags to metadata json.")
+    logger.info("merge tags to metadata json.")
-  for image_path in tqdm(image_paths):
+    for image_path in tqdm(image_paths):
-    tags_path = image_path.with_suffix(args.caption_extension)
+        tags_path = image_path.with_suffix(args.caption_extension)
-    tags = tags_path.read_text(encoding='utf-8').strip()
+        tags = tags_path.read_text(encoding="utf-8").strip()
-    if not os.path.exists(tags_path):
+        if not os.path.exists(tags_path):
-      tags_path = os.path.join(image_path, args.caption_extension)
+            tags_path = os.path.join(image_path, args.caption_extension)
-    image_key = str(image_path) if args.full_path else image_path.stem
+        image_key = str(image_path) if args.full_path else image_path.stem
-    if image_key not in metadata:
+        if image_key not in metadata:
-      metadata[image_key] = {}
+            metadata[image_key] = {}
-    metadata[image_key]['tags'] = tags
+        metadata[image_key]["tags"] = tags
-    if args.debug:
+        if args.debug:
-      logger.info(f"{image_key} {tags}")
+            logger.info(f"{image_key} {tags}")
-  # metadataを書き出して終わり
+    # metadataを書き出して終わり
-  logger.info(f"writing metadata: {args.out_json}")
+    logger.info(f"writing metadata: {args.out_json}")
-  Path(args.out_json).write_text(json.dumps(metadata, indent=2), encoding='utf-8')
+    Path(args.out_json).write_text(json.dumps(metadata, indent=2), encoding="utf-8")
-  logger.info("done!")
+    logger.info("done!")
 def setup_parser() -> argparse.ArgumentParser:
-  parser = argparse.ArgumentParser()
+    parser = argparse.ArgumentParser()
-  parser.add_argument("train_data_dir", type=str, help="directory for train images / 学習画像データのディレクトリ")
+    parser.add_argument("train_data_dir", type=str, help="directory for train images / 学習画像データのディレクトリ")
-  parser.add_argument("out_json", type=str, help="metadata file to output / メタデータファイル書き出し先")
+    parser.add_argument("out_json", type=str, help="metadata file to output / メタデータファイル書き出し先")
-  parser.add_argument("--in_json", type=str,
+    parser.add_argument(
-                      help="metadata file to input (if omitted and out_json exists, existing out_json is read) / 読み込むメタデータファイル（省略時、out_jsonが存在すればそれを読み込む）")
+        "--in_json",
-  parser.add_argument("--full_path", action="store_true",
+        type=str,
-                      help="use full path as image-key in metadata (supports multiple directories) / メタデータで画像キーをフルパスにする（複数の学習画像ディレクトリに対応）")
+        help="metadata file to input (if omitted and out_json exists, existing out_json is read) / 読み込むメタデータファイル（省略時、out_jsonが存在すればそれを読み込む）",
-  parser.add_argument("--recursive", action="store_true",
+    )
-                      help="recursively look for training tags in all child folders of train_data_dir / train_data_dirのすべての子フォルダにある学習タグを再帰的に探す")
+    parser.add_argument(
-  parser.add_argument("--caption_extension", type=str, default=".txt",
+        "--full_path",
-                      help="extension of caption (tag) file / 読み込むキャプション（タグ）ファイルの拡張子")
+        action="store_true",
-  parser.add_argument("--debug", action="store_true", help="debug mode, print tags")
+        help="use full path as image-key in metadata (supports multiple directories) / メタデータで画像キーをフルパスにする（複数の学習画像ディレクトリに対応）",
    )
    parser.add_argument(
        "--recursive",
        action="store_true",
        help="recursively look for training tags in all child folders of train_data_dir / train_data_dirのすべての子フォルダにある学習タグを再帰的に探す",
    )
    parser.add_argument(
        "--caption_extension",
        type=str,
        default=".txt",
        help="extension of caption (tag) file / 読み込むキャプション（タグ）ファイルの拡張子",
    )
    parser.add_argument("--debug", action="store_true", help="debug mode, print tags")
-  return parser
+    return parser
-if __name__ == '__main__':
+if __name__ == "__main__":
-  parser = setup_parser()
+    parser = setup_parser()
-  args = parser.parse_args()
+    args = parser.parse_args()
-  main(args)
+    main(args)
--- a/library/train_util.py
+++ b/library/train_util.py
@@ -694,6 +694,10 @@ class BaseDataset(torch.utils.data.Dataset):
        else:
            # process wildcards
            if subset.enable_wildcard:
                # if caption is multiline, random choice one line
                if "\n" in caption:
                    caption = random.choice(caption.split("\n"))
                # wildcard is like '{aaa|bbb|ccc...}'
                # escape the curly braces like {{ or }}
                replacer1 = "⦅"
@@ -712,6 +716,9 @@ class BaseDataset(torch.utils.data.Dataset):
                # unescape the curly braces
                caption = caption.replace(replacer1, "{").replace(replacer2, "}")
            else:
                # if caption is multiline, use the first line
                caption = caption.split("\n")[0]
            if subset.shuffle_caption or subset.token_warmup_step > 0 or subset.caption_tag_dropout_rate > 0:
                fixed_tokens = []
@@ -1447,7 +1454,7 @@ class DreamBoothDataset(BaseDataset):
            self.bucket_reso_steps = None  # この情報は使われない
            self.bucket_no_upscale = False
-        def read_caption(img_path, caption_extension):
+        def read_caption(img_path, caption_extension, enable_wildcard):
            # captionの候補ファイル名を作る
            base_name = os.path.splitext(img_path)[0]
            base_name_face_det = base_name
@@ -1466,7 +1473,10 @@ class DreamBoothDataset(BaseDataset):
                            logger.error(f"illegal char in file (not UTF-8) / ファイルにUTF-8以外の文字があります: {cap_path}")
                            raise e
                        assert len(lines) > 0, f"caption file is empty / キャプションファイルが空です: {cap_path}"
-                        caption = lines[0].strip()
+                        if enable_wildcard:
                            caption = "\n".join([line.strip() for line in lines if line.strip() != ""])  # 空行を除く、改行で連結
                        else:
                            caption = lines[0].strip()
                    break
            return caption
@@ -1482,7 +1492,7 @@ class DreamBoothDataset(BaseDataset):
            captions = []
            missing_captions = []
            for img_path in img_paths:
-                cap_for_img = read_caption(img_path, subset.caption_extension)
+                cap_for_img = read_caption(img_path, subset.caption_extension, subset.enable_wildcard)
                if cap_for_img is None and subset.class_tokens is None:
                    logger.warning(
                        f"neither caption file nor class tokens are found. use empty caption for {img_path} / キャプションファイルもclass tokenも見つかりませんでした。空のキャプションを使用します: {img_path}"
@@ -1516,7 +1526,7 @@ class DreamBoothDataset(BaseDataset):
        logger.info("prepare images.")
        num_train_images = 0
        num_reg_images = 0
-        reg_infos: List[ImageInfo] = []
+        reg_infos: List[Tuple[ImageInfo, DreamBoothSubset]] = []
        for subset in subsets:
            if subset.num_repeats < 1:
                logger.warning(
@@ -1545,7 +1555,7 @@ class DreamBoothDataset(BaseDataset):
            for img_path, caption in zip(img_paths, captions):
                info = ImageInfo(img_path, subset.num_repeats, caption, subset.is_reg, img_path)
                if subset.is_reg:
-                    reg_infos.append(info)
+                    reg_infos.append((info, subset))
                else:
                    self.register_image(info, subset)
@@ -1566,7 +1576,7 @@ class DreamBoothDataset(BaseDataset):
            n = 0
            first_loop = True
            while n < num_train_images:
-                for info in reg_infos:
+                for info, subset in reg_infos:
                    if first_loop:
                        self.register_image(info, subset)
                        n += info.num_repeats
@@ -1658,10 +1668,24 @@ class FineTuningDataset(BaseDataset):
                caption = img_md.get("caption")
                tags = img_md.get("tags")
                if caption is None:
-                    caption = tags
+                    caption = tags  # could be multiline
-                elif tags is not None and len(tags) > 0:
+                    tags = None
-                    caption = caption + ", " + tags
+
-                    tags_list.append(tags)
+                if subset.enable_wildcard:
                    # tags must be single line
                    if tags is not None:
                        tags = tags.replace("\n", subset.caption_separator)
                    # add tags to each line of caption
                    if caption is not None and tags is not None:
                        caption = "\n".join(
                            [f"{line}{subset.caption_separator}{tags}" for line in caption.split("\n") if line.strip() != ""]
                        )
                else:
                    # use as is
                    if tags is not None and len(tags) > 0:
                        caption = caption + subset.caption_separator + tags
                        tags_list.append(tags)
                if caption is None:
                    caption = ""
@@ -3315,6 +3339,18 @@ def verify_training_args(args: argparse.Namespace):
            + " / zero_terminal_snrが有効ですが、v_parameterizationが有効ではありません。学習結果は想定外になる可能性があります"
        )
    if args.sample_every_n_epochs is not None and args.sample_every_n_epochs <= 0:
        logger.warning(
            "sample_every_n_epochs is less than or equal to 0, so it will be disabled / sample_every_n_epochsに0以下の値が指定されたため無効になります"
        )
        args.sample_every_n_epochs = None
    if args.sample_every_n_steps is not None and args.sample_every_n_steps <= 0:
        logger.warning(
            "sample_every_n_steps is less than or equal to 0, so it will be disabled / sample_every_n_stepsに0以下の値が指定されたため無効になります"
        )
        args.sample_every_n_steps = None
 def add_dataset_arguments(
    parser: argparse.ArgumentParser, support_dreambooth: bool, support_caption: bool, support_caption_dropout: bool