update wd14 tagger and doc

2026-04-08 22:35:09 +00:00 · 2024-03-30 21:48:22 +09:00
parent 6ba84288d9
commit cae5aa0a56
4 changed files with 277 additions and 25 deletions
--- a/README.md
+++ b/README.md
@@ -156,6 +156,14 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
 - The support for v3 repositories is added to `tag_image_by_wd14_tagger.py` (`--onnx` option only). PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) Thanks to sdbds!
  - Onnx may need to be updated. Onnx is not installed by default, so please install or update it with `pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` etc. Please also check the comments in `requirements.txt`.
 - The model is now saved in the subdirectory as `--repo_id` in `tag_image_by_wd14_tagger.py` . This caches multiple repo_id models. Please delete unnecessary files under `--model_dir`.
 - Some options are added to `tag_image_by_wd14_tagger.py`.
  - Some are added in PR [#1216](https://github.com/kohya-ss/sd-scripts/pull/1216) Thanks to Disty0!
  - Output rating tags `--use_rating_tags` and `--use_rating_tags_as_last_tag`
  - Output character tags first `--character_tags_first`
  - Expand character tags and series `--character_tag_expand`
  - Specify tags to output first `--always_first_tags`
  - Replace tags `--tag_replacement`
  - See [Tagging documentation](./docs/wd14_tagger_README-en.md) for details.
 - Fixed an error when specifying `--beam_search` and a value of 2 or more for `--num_beams` in `make_captions.py`.
 - The options `--noise_offset_random_strength` and `--ip_noise_gamma_random_strength` are added to each training script. These options can be used to vary the noise offset and ip noise gamma in the range of 0 to the specified value. PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) Thanks to KohakuBlueleaf!
 - The options `--save_state_on_train_end` are added to each training script. PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) Thanks to gesen2egee!
@@ -181,6 +189,14 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
 - `tag_image_by_wd14_tagger.py` で v3 のリポジトリがサポートされました（`--onnx` 指定時のみ有効）。 PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) sdbds 氏に感謝します。
  - Onnx のバージョンアップが必要になるかもしれません。デフォルトでは Onnx はインストールされていませんので、`pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` 等でインストール、アップデートしてください。`requirements.txt` のコメントもあわせてご確認ください。
 - `tag_image_by_wd14_tagger.py` で、モデルを`--repo_id` のサブディレクトリに保存するようにしました。これにより複数のモデルファイルがキャッシュされます。`--model_dir` 直下の不要なファイルは削除願います。
 - `tag_image_by_wd14_tagger.py` にいくつかのオプションを追加しました。
  - 一部は PR [#1216](https://github.com/kohya-ss/sd-scripts/pull/1216) で追加されました。Disty0 氏に感謝します。
  - レーティングタグを出力する `--use_rating_tags` および `--use_rating_tags_as_last_tag`
  - キャラクタタグを最初に出力する `--character_tags_first`
  - キャラクタタグとシリーズを展開する `--character_tag_expand`
  - 常に最初に出力するタグを指定する `--always_first_tags`
  - タグを置換する `--tag_replacement`
  - 詳細は [タグ付けに関するドキュメント](./docs/wd14_tagger_README-ja.md) をご覧ください。
 - `make_captions.py` で `--beam_search` を指定し `--num_beams` に2以上の値を指定した時のエラーを修正しました。
 - 各学習スクリプトに、noise offset、ip noise gammaを、それぞれ 0~指定した値の範囲で変動させるオプション `--noise_offset_random_strength` および `--ip_noise_gamma_random_strength` が追加されました。 PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) KohakuBlueleaf 氏に感謝します。
 - 各学習スクリプトに、学習終了時に state を保存する `--save_state_on_train_end` オプションが追加されました。 PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) gesen2egee 氏に感謝します。
--- a/docs/wd14_tagger_README-en.md
+++ b/docs/wd14_tagger_README-en.md
@@ -0,0 +1,85 @@
 # Image Tagging using WD14Tagger
 This document is based on the information from this github page (https://github.com/toriato/stable-diffusion-webui-wd14-tagger#mrsmilingwolfs-model-aka-waifu-diffusion-14-tagger).
 Using onnx for inference is recommended. Please install onnx with the following command:
 ```powershell
 pip install onnx==1.15.0 onnxruntime-gpu==1.17.1  
 ```
 The model weights will be automatically downloaded from Hugging Face.
 # Usage
 Run the script to perform tagging.
 ```powershell
 python finetune/tag_images_by_wd14_tagger.py --onnx --repo_id <model repo id> --batch_size <batch size> <training data folder>
 ```
 For example, if using the repository `SmilingWolf/wd-swinv2-tagger-v3` with a batch size of 4, and the training data is located in the parent folder `train_data`, it would be:
 ```powershell
 python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 --batch_size 4 ..\train_data
 ```
 On the first run, the model files will be automatically downloaded to the `wd14_tagger_model` folder (the folder can be changed with an option). 
 Tag files will be created in the same directory as the training data images, with the same filename and a `.txt` extension.
 ![Generated tag files](https://user-images.githubusercontent.com/52813779/208910534-ea514373-1185-4b7d-9ae3-61eb50bc294e.png)
 ![Tags and image](https://user-images.githubusercontent.com/52813779/208910599-29070c15-7639-474f-b3e4-06bd5a3df29e.png)
 ## Example
 To output in the Animagine XL 3.1 format, it would be as follows (enter on a single line in practice):
 ```
 python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 
    --batch_size 4  --remove_underscore --undesired_tags "PUT,YOUR,UNDESIRED,TAGS" --recursive 
    --use_rating_tagss_as_last_tag --character_tags_first --character_tag_expand 
    --always_first_tags "1girl,1boy"  ..\train_data
 ```
 ## Available Repository IDs
 [SmilingWolf's V2 and V3 models](https://huggingface.co/SmilingWolf) are available for use. Specify them in the format like `SmilingWolf/wd-vit-tagger-v3`. The default when omitted is `SmilingWolf/wd-v1-4-convnext-tagger-v2`.
 # Options 
 ## General Options
 - `--onnx`: Use ONNX for inference. If not specified, TensorFlow will be used. If using TensorFlow, please install TensorFlow separately. 
 - `--batch_size`: Number of images to process at once. Default is 1. Adjust according to VRAM capacity.
 - `--caption_extension`: File extension for caption files. Default is `.txt`.
 - `--max_data_loader_n_workers`: Maximum number of workers for DataLoader. Specifying a value of 1 or more will use DataLoader to speed up image loading. If unspecified, DataLoader will not be used.
 - `--thresh`: Confidence threshold for outputting tags. Default is 0.35. Lowering the value will assign more tags but accuracy will decrease. 
 - `--general_threshold`: Confidence threshold for general tags. If omitted, same as `--thresh`.
 - `--character_threshold`: Confidence threshold for character tags. If omitted, same as `--thresh`.
 - `--recursive`: If specified, subfolders within the specified folder will also be processed recursively.
 - `--append_tags`: Append tags to existing tag files.
 - `--frequency_tags`: Output tag frequencies.  
 - `--debug`: Debug mode. Outputs debug information if specified.
 ## Model Download
 - `--model_dir`: Folder to save model files. Default is `wd14_tagger_model`.  
 - `--force_download`: Re-download model files if specified.
 ## Tag Editing
 - `--remove_underscore`: Remove underscores from output tags.
 - `--undesired_tags`: Specify tags not to output. Multiple tags can be specified, separated by commas. For example, `black eyes,black hair`.
 - `--use_rating_tags`: Output rating tags at the beginning of the tags.
 - `--use_rating_tags_as_last_tag`: Add rating tags at the end of the tags.
 - `--character_tags_first`: Output character tags first.
 - `--character_tag_expand`: Expand character tag series names. For example, split the tag `chara_name_(series)` into `chara_name, series`.  
 - `--always_first_tags`: Specify tags to always output first when a certain tag appears in an image. Multiple tags can be specified, separated by commas. For example, `1girl,1boy`.
 - `--caption_separator`: Separate tags with this string in the output file. Default is `, `.
 - `--tag_replacement`: Perform tag replacement. Specify in the format `tag1,tag2;tag3,tag4`. 
 When specifying `remove_underscore`, specify `undesired_tags`, `always_first_tags`, and `tag_replacement` without including underscores.
 When specifying `caption_separator`, separate `undesired_tags` and `always_first_tags` with `caption_separator`. Always separate `tag_replacement` with `,`.
--- a/docs/wd14_tagger_README-ja.md
+++ b/docs/wd14_tagger_README-ja.md
@@ -0,0 +1,85 @@
 # WD14Taggerによるタグ付け
 こちらのgithubページ（https://github.com/toriato/stable-diffusion-webui-wd14-tagger#mrsmilingwolfs-model-aka-waifu-diffusion-14-tagger ）の情報を参考にさせていただきました。
 onnx を用いた推論を推奨します。以下のコマンドで onnx をインストールしてください。
 ```powershell
 pip install onnx==1.15.0 onnxruntime-gpu==1.17.1
 ```
 モデルの重みはHugging Faceから自動的にダウンロードしてきます。
 # 使い方
 スクリプトを実行してタグ付けを行います。
 ```
 python fintune/tag_images_by_wd14_tagger.py --onnx --repo_id <モデルのrepo id> --batch_size <バッチサイズ> <教師データフォルダ>
 ```
 レポジトリに `SmilingWolf/wd-swinv2-tagger-v3` を使用し、バッチサイズを4にして、教師データを親フォルダの `train_data`に置いた場合、以下のようになります。
 ```
 python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 --batch_size 4 ..\train_data
 ```
 初回起動時にはモデルファイルが `wd14_tagger_model` フォルダに自動的にダウンロードされます（フォルダはオプションで変えられます）。
 タグファイルが教師データ画像と同じディレクトリに、同じファイル名、拡張子.txtで作成されます。
 ![生成されたタグファイル](https://user-images.githubusercontent.com/52813779/208910534-ea514373-1185-4b7d-9ae3-61eb50bc294e.png)
 ![タグと画像](https://user-images.githubusercontent.com/52813779/208910599-29070c15-7639-474f-b3e4-06bd5a3df29e.png)
 ## 記述例
 Animagine XL 3.1 方式で出力する場合、以下のようになります（実際には 1 行で入力してください）。
 ```
 python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 
    --batch_size 4  --remove_underscore --undesired_tags "PUT,YOUR,UNDESIRED,TAGS" --recursive 
    --use_rating_tagss_as_last_tag --character_tags_first --character_tag_expand 
    --always_first_tags "1girl,1boy"  ..\train_data
 ```
 ## 使用可能なリポジトリID
 [SmilingWolf 氏の V2、V3 のモデル](https://huggingface.co/SmilingWolf)が使用可能です。`SmilingWolf/wd-vit-tagger-v3` のように指定してください。省略時のデフォルトは `SmilingWolf/wd-v1-4-convnext-tagger-v2` です。
 # オプション
 ## 一般オプション
 - `--onnx` : ONNX を使用して推論します。指定しない場合は TensorFlow を使用します。TensorFlow 使用時は別途 TensorFlow をインストールしてください。
 - `--batch_size` : 一度に処理する画像の数。デフォルトは1です。VRAMの容量に応じて増減してください。
 - `--caption_extension` : キャプションファイルの拡張子。デフォルトは `.txt` です。
 - `--max_data_loader_n_workers` : DataLoader の最大ワーカー数です。このオプションに 1 以上の数値を指定すると、DataLoader を用いて画像読み込みを高速化します。未指定時は DataLoader を用いません。
 - `--thresh` : 出力するタグの信頼度の閾値。デフォルトは0.35です。値を下げるとより多くのタグが付与されますが、精度は下がります。
 - `--general_threshold` : 一般タグの信頼度の閾値。省略時は `--thresh` と同じです。
 - `--character_threshold` : キャラクタータグの信頼度の閾値。省略時は `--thresh` と同じです。
 - `--recursive` : 指定すると、指定したフォルダ内のサブフォルダも再帰的に処理します。
 - `--append_tags` : 既存のタグファイルにタグを追加します。
 - `--frequency_tags` : タグの頻度を出力します。
 - `--debug` : デバッグモード。指定するとデバッグ情報を出力します。
 ## モデルのダウンロード
 - `--model_dir` : モデルファイルの保存先フォルダ。デフォルトは `wd14_tagger_model` です。
 - `--force_download` : 指定するとモデルファイルを再ダウンロードします。
 ## タグ編集関連
 - `--remove_underscore` : 出力するタグからアンダースコアを削除します。
 - `--undesired_tags` : 出力しないタグを指定します。カンマ区切りで複数指定できます。たとえば `black eyes,black hair` のように指定します。
 - `--use_rating_tags` : タグの最初にレーティングタグを出力します。
 - `--use_rating_tags_as_last_tag` : タグの最後にレーティングタグを追加します。
 - `--character_tags_first` : キャラクタータグを最初に出力します。
 - `--character_tag_expand` : キャラクタータグのシリーズ名を展開します。たとえば `chara_name_(series)` のタグを `chara_name, series` に分割します。
 - `--always_first_tags` : あるタグが画像に出力されたとき、そのタグを最初に出力するタグを指定します。カンマ区切りで複数指定できます。たとえば `1girl,1boy` のように指定します。
 - `--caption_separator` : 出力するファイルでタグをこの文字列で区切ります。デフォルトは `, ` です。
 - `--tag_replacement` : タグの置換を行います。`tag1,tag2;tag3,tag4` のように指定します。
 `remove_underscore` 指定時は、`undesired_tags`、`always_first_tags`、`tag_replacement` はアンダースコアを含めずに指定してください。
 `caption_separator` 指定時は、`undesired_tags`、`always_first_tags` は `caption_separator`  で区切ってください。`tag_replacement` は必ず `,` で区切ってください。
--- a/finetune/tag_images_by_wd14_tagger.py
+++ b/finetune/tag_images_by_wd14_tagger.py
@@ -62,12 +62,12 @@ class ImageLoadingPrepDataset(torch.utils.data.Dataset):
        try:
            image = Image.open(img_path).convert("RGB")
            image = preprocess_image(image)
-            tensor = torch.tensor(image)
+            # tensor = torch.tensor(image) # これ Tensor に変換する必要ないな……(;･∀･)
        except Exception as e:
            logger.error(f"Could not load image path / 画像を読み込めません: {img_path}, error: {e}")
            return None
-        return (tensor, img_path)
+        return (image, img_path)
 def collate_fn_remove_corrupted(batch):
@@ -110,7 +110,7 @@ def main(args):
    else:
        logger.info("using existing wd14 tagger model")
-    # 画像を読み込む
+    # モデルを読み込む
    if args.onnx:
        import torch
        import onnx
@@ -178,8 +178,43 @@ def main(args):
    general_tags = [row[1] for row in rows[0:] if row[2] == "0"]
    character_tags = [row[1] for row in rows[0:] if row[2] == "4"]
-    # 画像を読み込む
+    # preprocess tags in advance
    if args.character_tag_expand:
        for i, tag in enumerate(character_tags):
            if tag.endswith(")"):
                # chara_name_(series) -> chara_name, series
                # chara_name_(costume)_(series) -> chara_name_(costume), series
                tags = tag.split("(")
                character_tag = "(".join(tags[:-1])
                if character_tag.endswith("_"):
                    character_tag = character_tag[:-1]
                series_tag = tags[-1].replace(")", "")
                character_tags[i] = character_tag + args.caption_separator + series_tag
    if args.remove_underscore:
        rating_tags = [tag.replace("_", " ") if len(tag) > 3 else tag for tag in rating_tags]
        general_tags = [tag.replace("_", " ") if len(tag) > 3 else tag for tag in general_tags]
        character_tags = [tag.replace("_", " ") if len(tag) > 3 else tag for tag in character_tags]
    if args.tag_replacement is not None:
        # escape , and ; in tag_replacement: wd14 tag names may contain , and ;
        escaped_tag_replacements = args.tag_replacement.replace("\\,", "@@@@").replace("\\;", "####")
        tag_replacements = escaped_tag_replacements.split(";")
        for tag_replacement in tag_replacements:
            tags = tag_replacement.split(",")  # source, target
            assert len(tags) == 2, f"tag replacement must be in the format of `source,target` / タグの置換は `置換元,置換先` の形式で指定してください: {args.tag_replacement}"
            source, target = [tag.replace("@@@@", ",").replace("####", ";") for tag in tags]
            logger.info(f"replacing tag: {source} -> {target}")
            if source in general_tags:
                general_tags[general_tags.index(source)] = target
            elif source in character_tags:
                character_tags[character_tags.index(source)] = target
            elif source in rating_tags:
                rating_tags[rating_tags.index(source)] = target
    # 画像を読み込む
    train_data_dir_path = Path(args.train_data_dir)
    image_paths = train_util.glob_images_pathlib(train_data_dir_path, args.recursive)
    logger.info(f"found {len(image_paths)} images.")
@@ -188,7 +223,12 @@ def main(args):
    caption_separator = args.caption_separator
    stripped_caption_separator = caption_separator.strip()
-    undesired_tags = set(args.undesired_tags.split(stripped_caption_separator))
+    undesired_tags = args.undesired_tags.split(stripped_caption_separator)
    undesired_tags = set([tag.strip() for tag in undesired_tags if tag.strip() != ""])
    always_first_tags = None
    if args.always_first_tags is not None:
        always_first_tags = [tag for tag in args.always_first_tags.split(stripped_caption_separator) if tag.strip() != ""]
    def run_batch(path_imgs):
        imgs = np.array([im for _, im in path_imgs])
@@ -208,13 +248,11 @@ def main(args):
            character_tag_text = ""
            general_tag_text = ""
-            # それ以降はタグなのでconfidenceがthresholdより高いものを追加する
+            # 最初の4つ以降はタグなのでconfidenceがthreshold以上のものを追加する
-            # Everything else is tags: pick any where prediction confidence > threshold
+            # First 4 labels are ratings, the rest are tags: pick any where prediction confidence >= threshold
            for i, p in enumerate(prob[4:]):
                if i < len(general_tags) and p >= args.general_threshold:
                    tag_name = general_tags[i]
                    if args.remove_underscore and len(tag_name) > 3:  # ignore emoji tags like >_< and ^_^
                        tag_name = tag_name.replace("_", " ")
                    if tag_name not in undesired_tags:
                        tag_freq[tag_name] = tag_freq.get(tag_name, 0) + 1
@@ -222,30 +260,37 @@ def main(args):
                        combined_tags.append(tag_name)
                elif i >= len(general_tags) and p >= args.character_threshold:
                    tag_name = character_tags[i - len(general_tags)]
                    if args.remove_underscore and len(tag_name) > 3:
                        tag_name = tag_name.replace("_", " ")
                    if tag_name not in undesired_tags:
                        tag_freq[tag_name] = tag_freq.get(tag_name, 0) + 1
                        character_tag_text += caption_separator + tag_name
                        if args.character_tags_first: # insert to the beginning
-                            combined_tags.insert(0,tag_name)
+                            combined_tags.insert(0, tag_name)
                        else:
                            combined_tags.append(tag_name)
-            #最初の4つはratingなので無視する
+            # 最初の4つはratingなのでargmaxで選ぶ
            # First 4 labels are actually ratings: pick one with argmax
-            if args.use_rating_tags:
+            if args.use_rating_tags or args.use_rating_tags_as_last_tag:
-                ratings_names = prob[:4]
+                ratings_probs = prob[:4]
-                rating_index = ratings_names.argmax()
+                rating_index = ratings_probs.argmax()
                found_rating = rating_tags[rating_index]
                if args.remove_underscore and len(found_rating) > 3:
                    found_rating = found_rating.replace("_", " ")
                if found_rating not in undesired_tags:
                    tag_freq[found_rating] = tag_freq.get(found_rating, 0) + 1
                    rating_tag_text = found_rating
-                    combined_tags.insert(0,found_rating) # insert to the beginning
+                    if args.use_rating_tags:
                        combined_tags.insert(0, found_rating) # insert to the beginning
                    else:
                        combined_tags.append(found_rating)
            # 一番最初に置くタグを指定する
            # Always put some tags at the beginning
            if always_first_tags is not None:
                for tag in always_first_tags:
                    if tag in combined_tags:
                        combined_tags.remove(tag)
                        combined_tags.insert(0, tag)
            # 先頭のカンマを取る
            if len(general_tag_text) > 0:
@@ -303,9 +348,7 @@ def main(args):
                continue
            image, image_path = data
-            if image is not None:
+            if image is None:
                image = image.detach().numpy()
            else:
                try:
                    image = Image.open(image_path)
                    if image.mode != "RGB":
@@ -407,7 +450,7 @@ def setup_parser() -> argparse.ArgumentParser:
        help="comma-separated list of undesired tags to remove from the output / 出力から除外したいタグのカンマ区切りのリスト",
    )
    parser.add_argument(
-        "--frequency_tags", action="store_true", help="Show frequency of tags for images / 画像ごとのタグの出現頻度を表示する"
+        "--frequency_tags", action="store_true", help="Show frequency of tags for images / タグの出現頻度を表示する"
    )
    parser.add_argument(
        "--onnx", action="store_true", help="use onnx model for inference / onnxモデルを推論に使用する"
@@ -416,10 +459,20 @@ def setup_parser() -> argparse.ArgumentParser:
        "--append_tags", action="store_true", help="Append captions instead of overwriting / 上書きではなくキャプションを追記する"
    )
    parser.add_argument(
-        "--use_rating_tags", action="store_true", help="Adds rating tags as the first tag",
+        "--use_rating_tags", action="store_true", help="Adds rating tags as the first tag / レーティングタグを最初のタグとして追加する",
    )
    parser.add_argument(
-        "--character_tags_first", action="store_true", help="Always inserts character tags before the general tags",
+        "--use_rating_tags_as_last_tag", action="store_true", help="Adds rating tags as the last tag / レーティングタグを最後のタグとして追加する",
    )
    parser.add_argument(
        "--character_tags_first", action="store_true", help="Always inserts character tags before the general tags / characterタグを常にgeneralタグの前に出力する",
    )
    parser.add_argument(
        "--always_first_tags",
        type=str,
        default=None,
        help="comma-separated list of tags to always put at the beginning, e.g. `1girl,1boy`"
        + " / 必ず先頭に置くタグのカンマ区切りリスト、例 : `1girl,1boy`",
    )
    parser.add_argument(
        "--caption_separator",
@@ -427,6 +480,19 @@ def setup_parser() -> argparse.ArgumentParser:
        default=", ",
        help="Separator for captions, include space if needed / キャプションの区切り文字、必要ならスペースを含めてください",
    )
    parser.add_argument(
        "--tag_replacement",
        type=str,
        default=None,
        help="tag replacement in the format of `source1,target1;source2,target2; ...`. Escape `,` and `;` with `\`. e.g. `tag1,tag2;tag3,tag4`"
        + " / タグの置換を `置換元1,置換先1;置換元2,置換先2; ...`で指定する。`\` で `,` と `;` をエスケープできる。例: `tag1,tag2;tag3,tag4`",
    )
    parser.add_argument(
        "--character_tag_expand",
        action="store_true",
        help="expand tag tail parenthesis to another tag for character tags. `chara_name_(series)` becomes `chara_name, series`"
        + " / キャラクタタグの末尾の括弧を別のタグに展開する。`chara_name_(series)` は `chara_name, series` になる",
    )
    return parser