Merge pull request #177 from kohya-ss/dev

Dev
2026-04-17 09:18:00 +00:00 · 2023-02-11 15:11:07 +09:00
parent 5777be5208 5367daa210
commit 43a41c6c43
3 changed files with 100 additions and 96 deletions
--- a/README.md
+++ b/README.md
@@ -1,86 +1,7 @@
 This repository contains training, generation and utility scripts for Stable Diffusion.

-## Updates
-
-__Stable Diffusion web UI now seems to support LoRA trained by ``sd-scripts``.__ Thank you for great work!!! 
-
-Note: The LoRA models for SD 2.x is not supported too in Web UI.
-
- 11 Feb. 2023, 2023/2/11:
-  - ``lora_interrogator.py`` is added in ``networks`` folder. See ``python networks\lora_interrogator.py -h`` for usage.
-    - For LoRAs where the activation word is unknown, this script compares the output of Text Encoder after applying LoRA to that of unapplied to find out which token is affected by LoRA. Hopefully you can figure out the activation word. LoRA trained with captions does not seem to be able to interrogate.
-    - Batch size can be large (like 64 or 128).
-  - ``lora_interrogator.py`` を ``network``フォルダに追加しました。使用法は ``python networks\lora_interrogator.py -h`` でご確認ください。
-    - このスクリプトは、起動promptがわからないLoRAについて、LoRA適用前後のText Encoderの出力を比較することで、どのtokenの出力が変化しているかを調べます。運が良ければ起動用の単語が分かります。キャプション付きで学習されたLoRAは影響が広範囲に及ぶため、調査は難しいようです。
-    - バッチサイズはわりと大きくできます（64や128など）。
- 10 Feb. 2023, 2023/2/10:
-  - Updated ``requirements.txt`` to prevent upgrading with pip taking a long time or failure to upgrade.
-  - ``resize_lora.py`` keeps the metadata of the model. ``dimension is resized from ...`` is added to the top of  ``ss_training_comment``.
-  - ``merge_lora.py`` supports models with different ``alpha``s. If there is a problem, old version is ``merge_lora_old.py``.
-  - ``svd_merge_lora.py`` is added. This script merges LoRA models with any rank (dim) and alpha, and approximate a new LoRA with svd for a specified rank (dim).
-  - Note: merging scripts erase the metadata currently.
-  - ``resize_images_to_resolution.py`` supports multibyte characters in filenames.
-  - pipでの更新が長時間掛かったり、更新に失敗したりするのを防ぐため、``requirements.txt``を更新しました。
-  - ``resize_lora.py``がメタデータを保持するようになりました。 ``dimension is resized from ...`` という文字列が ``ss_training_comment`` の先頭に追加されます。
-  - ``merge_lora.py``がalphaが異なるモデルをサポートしました。 何か問題がありましたら旧バージョン ``merge_lora_old.py`` をお使いください。
-  - ``svd_merge_lora.py`` を追加しました。 複数の任意のdim (rank)、alphaのLoRAモデルをマージし、svdで任意dim(rank)のLoRAで近似します。 
-  - 注：マージ系のスクリプトは現時点ではメタデータを消去しますのでご注意ください。
-  - ``resize_images_to_resolution.py``が日本語ファイル名をサポートしました。
-  
- 9 Feb. 2023, 2023/2/9:
-  - Caption dropout is supported in ``train_db.py``, ``fine_tune.py`` and ``train_network.py``. Thanks to forestsource!
-    - ``--caption_dropout_rate`` option specifies the dropout rate for captions (0~1.0, 0.1 means 10% chance for dropout). If dropout occurs, the image is trained with the empty caption. Default is 0 (no dropout).
-    - ``--caption_dropout_every_n_epochs`` option specifies how many epochs to drop captions. If ``3`` is specified, in epoch 3, 6, 9 ..., images are trained with all captions empty. Default is None (no dropout).
-    - ``--caption_tag_dropout_rate`` option specified the dropout rate for tags (comma separated tokens) (0~1.0, 0.1 means 10% chance for dropout). If dropout occurs, the tag is removed from the caption. If ``--keep_tokens`` option is set, these tokens (tags) are not dropped. Default is 0 (no droupout).
-    - The bulk image downsampling script is added. Documentation is [here](https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E7%94%BB%E5%83%8F%E3%83%AA%E3%82%B5%E3%82%A4%E3%82%BA%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88) (in Jpanaese). Thanks to bmaltais!
-    - Typo check is added. Thanks to shirayu!
-  - キャプションのドロップアウトを``train_db.py``、``fine_tune.py``、``train_network.py``の各スクリプトに追加しました。forestsource氏に感謝します。
-    - ``--caption_dropout_rate``オプションでキャプションのドロップアウト率を指定します（0~1.0、 0.1を指定すると10%の確率でドロップアウト）。ドロップアウトされた場合、画像は空のキャプションで学習されます。デフォルトは 0 （ドロップアウトなし）です。
-    - ``--caption_dropout_every_n_epochs`` オプションで何エポックごとにキャプションを完全にドロップアウトするか指定します。たとえば``3``を指定すると、エポック3、6、9……で、すべての画像がキャプションなしで学習されます。デフォルトは None （ドロップアウトなし）です。
-    - ``--caption_tag_dropout_rate`` オプションで各タグ（カンマ区切りの各部分）のドロップアウト率を指定します（0~1.0、 0.1を指定すると10%の確率でドロップアウト）。ドロップアウトが起きるとそのタグはそのときだけキャプションから取り除かれて学習されます。``--keep_tokens`` オプションを指定していると、シャッフルされない部分のタグはドロップアウトされません。デフォルトは 0 （ドロップアウトなし）です。
-    - 画像の一括縮小スクリプトを追加しました。ドキュメントは [こちら](https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E7%94%BB%E5%83%8F%E3%83%AA%E3%82%B5%E3%82%A4%E3%82%BA%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88) です。bmaltais氏に感謝します。
-    - 誤字チェッカが追加されました。shirayu氏に感謝します。
-    
- 6 Feb. 2023, 2023/2/6：
-  - ``--bucket_reso_steps`` and ``--bucket_no_upscale`` options are added to training scripts (fine tuning, DreamBooth, LoRA and Textual Inversion) and ``prepare_buckets_latents.py``.
-  - ``--bucket_reso_steps`` takes the steps for buckets in aspect ratio bucketing. Default is 64, same as before.
-    - Any value greater than or equal to 1 can be specified; 64 is highly recommended and a value divisible by 8 is recommended.
-    - If less than 64 is specified, padding will occur within U-Net. The result is unknown.
-    - If you specify a value that is not divisible by 8, it will be truncated to divisible by 8 inside VAE, because the size of the latent is 1/8 of the image size.
-  - If ``--bucket_no_upscale`` option is specified, images smaller than the bucket size will be processed without upscaling.
-    - Internally, a bucket smaller than the image size is created (for example, if the image is 300x300 and ``bucket_reso_steps=64``, the bucket is 256x256). The image will be trimmed.
-    - Implementation of [#130](https://github.com/kohya-ss/sd-scripts/issues/130).
-    - Images with an area larger than the maximum size specified by ``--resolution`` are downsampled to the max bucket size.
-  - Now the number of data in each batch is limited to the number of actual images (not duplicated). Because a certain bucket may contain smaller number of actual images, so the batch may contain same (duplicated) images.
-  - ``--random_crop`` now also works with buckets enabled.
-    - Instead of always cropping the center of the image, the image is shifted left, right, up, and down to be used as the training data. This is expected to train to the edges of the image.
-    - Implementation of discussion [#34](https://github.com/kohya-ss/sd-scripts/discussions/34).
-
-  - ``--bucket_reso_steps``および``--bucket_no_upscale``オプションを、学習スクリプトおよび``prepare_buckets_latents.py``に追加しました。
-  - ``--bucket_reso_steps``オプションでは、bucketの解像度の単位を指定できます。デフォルトは64で、今までと同じ動作です。
-    - 1以上の任意の値を指定できます。基本的には64を推奨します。64以外の値では、8で割り切れる値を推奨します。
-    - 64未満を指定するとU-Netの内部でpaddingが発生します。どのような結果になるかは未知数です。
-    - 8で割り切れない値を指定すると余りはVAE内部で切り捨てられます。
-  - ``--bucket_no_upscale``オプションを指定すると、bucketサイズよりも小さい画像は拡大せずそのまま処理します。
-    - 内部的には画像サイズ以下のサイズのbucketを作成します（たとえば画像が300x300で``bucket_reso_steps=64``の場合、256x256のbucket）。余りは都度trimmingされます。
-    - [#130](https://github.com/kohya-ss/sd-scripts/issues/130) を実装したものです。
-    - ``--resolution``で指定した最大サイズよりも面積が大きい画像は、最大サイズと同じ面積になるようアスペクト比を維持したまま縮小され、そのサイズを元にbucketが作られます。
-  - これらのオプションによりbucketが細分化され、ひとつのバッチ内に同一画像が重複して存在することが増えたため、バッチサイズを``そのbucketの画像種類数``までに制限する機能を追加しました。
-    - たとえば繰り返し回数10で、あるbucketに1枚しか画像がなく、バッチサイズが10以上のとき、今まではepoch内で、同一画像を10枚含むバッチが1回だけ使用されていました。
-    - 機能追加後はepoch内にサイズ1のバッチが10回、使用されます。
-  - ``--random_crop``がbucketを有効にした場合にも機能するようになりました。
-    - 常に画像の中央を切り取るのではなく、左右、上下にずらして教師データにします。これにより画像端まで学習されることが期待されます。
-    - discussionの[#34](https://github.com/kohya-ss/sd-scripts/discussions/34)を実装したものです。
-    
-
-Stable Diffusion web UI本体で当リポジトリで学習したLoRAモデルによる画像生成がサポートされたようです。
-
-注：SD2.x用のLoRAモデルはサポートされないようです。
-
-Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
-最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
-
-##
+[__Change History__](#change-history) is moved to the bottom of the page.
+更新履歴は[ページ末尾](#change-history)に移しました。

 [日本語版README](./README-ja.md)

@@ -89,10 +10,13 @@ For easier use (GUI and PowerShell scripts etc...), please visit [the repository
 This repository contains the scripts for:

 * DreamBooth training, including U-Net and Text Encoder
-* fine-tuning (native training), including U-Net and Text Encoder
+* Fine-tuning (native training), including U-Net and Text Encoder
 * LoRA training
-* image generation
-* model conversion (supports 1.x and 2.x, Stable Diffision ckpt/safetensors and Diffusers)
+* Texutl Inversion training
+* Image generation
+* Model conversion (supports 1.x and 2.x, Stable Diffision ckpt/safetensors and Diffusers)
+
+__Stable Diffusion web UI now seems to support LoRA trained by ``sd-scripts``.__ (SD 1.x based only) Thank you for great work!!! 

 ## About requirements.txt

@@ -197,3 +121,77 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
 [bitsandbytes](https://github.com/TimDettmers/bitsandbytes): MIT

 [BLIP](https://github.com/salesforce/BLIP): BSD-3-Clause
+
+## Change History
+
+- 11 Feb. 2023, 2023/2/11:
+  - ``lora_interrogator.py`` is added in ``networks`` folder. See ``python networks\lora_interrogator.py -h`` for usage.
+    - For LoRAs where the activation word is unknown, this script compares the output of Text Encoder after applying LoRA to that of unapplied to find out which token is affected by LoRA. Hopefully you can figure out the activation word. LoRA trained with captions does not seem to be able to interrogate.
+    - Batch size can be large (like 64 or 128).
+  - ``train_textual_inversion.py`` now supports multiple init words.
+  - ``lora_interrogator.py`` を ``network``フォルダに追加しました。使用法は ``python networks\lora_interrogator.py -h`` でご確認ください。
+    - このスクリプトは、起動promptがわからないLoRAについて、LoRA適用前後のText Encoderの出力を比較することで、どのtokenの出力が変化しているかを調べます。運が良ければ起動用の単語が分かります。キャプション付きで学習されたLoRAは影響が広範囲に及ぶため、調査は難しいようです。
+    - バッチサイズはわりと大きくできます（64や128など）。
+  - ``train_textual_inversion.py`` で複数のinit_word指定が可能になりました。
+- 10 Feb. 2023, 2023/2/10:
+  - Updated ``requirements.txt`` to prevent upgrading with pip taking a long time or failure to upgrade.
+  - ``resize_lora.py`` keeps the metadata of the model. ``dimension is resized from ...`` is added to the top of  ``ss_training_comment``.
+  - ``merge_lora.py`` supports models with different ``alpha``s. If there is a problem, old version is ``merge_lora_old.py``.
+  - ``svd_merge_lora.py`` is added. This script merges LoRA models with any rank (dim) and alpha, and approximate a new LoRA with svd for a specified rank (dim).
+  - Note: merging scripts erase the metadata currently.
+  - ``resize_images_to_resolution.py`` supports multibyte characters in filenames.
+  - pipでの更新が長時間掛かったり、更新に失敗したりするのを防ぐため、``requirements.txt``を更新しました。
+  - ``resize_lora.py``がメタデータを保持するようになりました。 ``dimension is resized from ...`` という文字列が ``ss_training_comment`` の先頭に追加されます。
+  - ``merge_lora.py``がalphaが異なるモデルをサポートしました。 何か問題がありましたら旧バージョン ``merge_lora_old.py`` をお使いください。
+  - ``svd_merge_lora.py`` を追加しました。 複数の任意のdim (rank)、alphaのLoRAモデルをマージし、svdで任意dim(rank)のLoRAで近似します。 
+  - 注：マージ系のスクリプトは現時点ではメタデータを消去しますのでご注意ください。
+  - ``resize_images_to_resolution.py``が日本語ファイル名をサポートしました。
+  
+- 9 Feb. 2023, 2023/2/9:
+  - Caption dropout is supported in ``train_db.py``, ``fine_tune.py`` and ``train_network.py``. Thanks to forestsource!
+    - ``--caption_dropout_rate`` option specifies the dropout rate for captions (0~1.0, 0.1 means 10% chance for dropout). If dropout occurs, the image is trained with the empty caption. Default is 0 (no dropout).
+    - ``--caption_dropout_every_n_epochs`` option specifies how many epochs to drop captions. If ``3`` is specified, in epoch 3, 6, 9 ..., images are trained with all captions empty. Default is None (no dropout).
+    - ``--caption_tag_dropout_rate`` option specified the dropout rate for tags (comma separated tokens) (0~1.0, 0.1 means 10% chance for dropout). If dropout occurs, the tag is removed from the caption. If ``--keep_tokens`` option is set, these tokens (tags) are not dropped. Default is 0 (no droupout).
+    - The bulk image downsampling script is added. Documentation is [here](https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E7%94%BB%E5%83%8F%E3%83%AA%E3%82%B5%E3%82%A4%E3%82%BA%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88) (in Jpanaese). Thanks to bmaltais!
+    - Typo check is added. Thanks to shirayu!
+  - キャプションのドロップアウトを``train_db.py``、``fine_tune.py``、``train_network.py``の各スクリプトに追加しました。forestsource氏に感謝します。
+    - ``--caption_dropout_rate``オプションでキャプションのドロップアウト率を指定します（0~1.0、 0.1を指定すると10%の確率でドロップアウト）。ドロップアウトされた場合、画像は空のキャプションで学習されます。デフォルトは 0 （ドロップアウトなし）です。
+    - ``--caption_dropout_every_n_epochs`` オプションで何エポックごとにキャプションを完全にドロップアウトするか指定します。たとえば``3``を指定すると、エポック3、6、9……で、すべての画像がキャプションなしで学習されます。デフォルトは None （ドロップアウトなし）です。
+    - ``--caption_tag_dropout_rate`` オプションで各タグ（カンマ区切りの各部分）のドロップアウト率を指定します（0~1.0、 0.1を指定すると10%の確率でドロップアウト）。ドロップアウトが起きるとそのタグはそのときだけキャプションから取り除かれて学習されます。``--keep_tokens`` オプションを指定していると、シャッフルされない部分のタグはドロップアウトされません。デフォルトは 0 （ドロップアウトなし）です。
+    - 画像の一括縮小スクリプトを追加しました。ドキュメントは [こちら](https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E7%94%BB%E5%83%8F%E3%83%AA%E3%82%B5%E3%82%A4%E3%82%BA%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88) です。bmaltais氏に感謝します。
+    - 誤字チェッカが追加されました。shirayu氏に感謝します。
+    
+- 6 Feb. 2023, 2023/2/6：
+  - ``--bucket_reso_steps`` and ``--bucket_no_upscale`` options are added to training scripts (fine tuning, DreamBooth, LoRA and Textual Inversion) and ``prepare_buckets_latents.py``.
+  - ``--bucket_reso_steps`` takes the steps for buckets in aspect ratio bucketing. Default is 64, same as before.
+    - Any value greater than or equal to 1 can be specified; 64 is highly recommended and a value divisible by 8 is recommended.
+    - If less than 64 is specified, padding will occur within U-Net. The result is unknown.
+    - If you specify a value that is not divisible by 8, it will be truncated to divisible by 8 inside VAE, because the size of the latent is 1/8 of the image size.
+  - If ``--bucket_no_upscale`` option is specified, images smaller than the bucket size will be processed without upscaling.
+    - Internally, a bucket smaller than the image size is created (for example, if the image is 300x300 and ``bucket_reso_steps=64``, the bucket is 256x256). The image will be trimmed.
+    - Implementation of [#130](https://github.com/kohya-ss/sd-scripts/issues/130).
+    - Images with an area larger than the maximum size specified by ``--resolution`` are downsampled to the max bucket size.
+  - Now the number of data in each batch is limited to the number of actual images (not duplicated). Because a certain bucket may contain smaller number of actual images, so the batch may contain same (duplicated) images.
+  - ``--random_crop`` now also works with buckets enabled.
+    - Instead of always cropping the center of the image, the image is shifted left, right, up, and down to be used as the training data. This is expected to train to the edges of the image.
+    - Implementation of discussion [#34](https://github.com/kohya-ss/sd-scripts/discussions/34).
+
+  - ``--bucket_reso_steps``および``--bucket_no_upscale``オプションを、学習スクリプトおよび``prepare_buckets_latents.py``に追加しました。
+  - ``--bucket_reso_steps``オプションでは、bucketの解像度の単位を指定できます。デフォルトは64で、今までと同じ動作です。
+    - 1以上の任意の値を指定できます。基本的には64を推奨します。64以外の値では、8で割り切れる値を推奨します。
+    - 64未満を指定するとU-Netの内部でpaddingが発生します。どのような結果になるかは未知数です。
+    - 8で割り切れない値を指定すると余りはVAE内部で切り捨てられます。
+  - ``--bucket_no_upscale``オプションを指定すると、bucketサイズよりも小さい画像は拡大せずそのまま処理します。
+    - 内部的には画像サイズ以下のサイズのbucketを作成します（たとえば画像が300x300で``bucket_reso_steps=64``の場合、256x256のbucket）。余りは都度trimmingされます。
+    - [#130](https://github.com/kohya-ss/sd-scripts/issues/130) を実装したものです。
+    - ``--resolution``で指定した最大サイズよりも面積が大きい画像は、最大サイズと同じ面積になるようアスペクト比を維持したまま縮小され、そのサイズを元にbucketが作られます。
+  - これらのオプションによりbucketが細分化され、ひとつのバッチ内に同一画像が重複して存在することが増えたため、バッチサイズを``そのbucketの画像種類数``までに制限する機能を追加しました。
+    - たとえば繰り返し回数10で、あるbucketに1枚しか画像がなく、バッチサイズが10以上のとき、今まではepoch内で、同一画像を10枚含むバッチが1回だけ使用されていました。
+    - 機能追加後はepoch内にサイズ1のバッチが10回、使用されます。
+  - ``--random_crop``がbucketを有効にした場合にも機能するようになりました。
+    - 常に画像の中央を切り取るのではなく、左右、上下にずらして教師データにします。これにより画像端まで学習されることが期待されます。
+    - discussionの[#34](https://github.com/kohya-ss/sd-scripts/discussions/34)を実装したものです。
+    
+
+Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
+最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
--- a/library/train_util.py
+++ b/library/train_util.py
@@ -226,6 +226,7 @@ class BaseDataset(torch.utils.data.Dataset):
    self.current_epoch: int = 0            # インスタンスがepochごとに新しく作られるようなので外側から渡さないとダメ
    self.dropout_rate: float = 0
    self.dropout_every_n_epochs: int = None
+    self.tag_dropout_rate: float = 0

    # augmentation
    flip_p = 0.5 if flip_aug else 0.0
@@ -284,7 +285,7 @@ class BaseDataset(torch.utils.data.Dataset):
    if is_drop_out:
      caption = ""
    else:
-      if self.shuffle_caption:
+      if self.shuffle_caption or self.tag_dropout_rate > 0:
        def dropout_tags(tokens):
          if self.tag_dropout_rate <= 0:
            return tokens
@@ -296,13 +297,18 @@ class BaseDataset(torch.utils.data.Dataset):

        tokens = [t.strip() for t in caption.strip().split(",")]
        if self.shuffle_keep_tokens is None:
-          random.shuffle(tokens)
+          if self.shuffle_caption:
+            random.shuffle(tokens)
+          
          tokens = dropout_tags(tokens)
        else:
          if len(tokens) > self.shuffle_keep_tokens:
            keep_tokens = tokens[:self.shuffle_keep_tokens]
            tokens = tokens[self.shuffle_keep_tokens:]
-            random.shuffle(tokens)
+
+            if self.shuffle_caption:
+              random.shuffle(tokens)
+            
            tokens = dropout_tags(tokens)

            tokens = keep_tokens + tokens
--- a/train_textual_inversion.py
+++ b/train_textual_inversion.py
@@ -98,12 +98,12 @@ def train(args):

  # Convert the init_word to token_id
  if args.init_word is not None:
-    init_token_id = tokenizer.encode(args.init_word, add_special_tokens=False)
-    assert len(
-        init_token_id) == 1, f"init word {args.init_word} is not converted to single token / 初期化単語が二つ以上のトークンに変換されます。別の単語を使ってください"
-    init_token_id = init_token_id[0]
+    init_token_ids = tokenizer.encode(args.init_word, add_special_tokens=False)
+    if len(init_token_ids) > 1 and len(init_token_ids) != args.num_vectors_per_token:
+      print(
+          f"token length for init words is not same to num_vectors_per_token, init words is repeated or truncated / 初期化単語のトークン長がnum_vectors_per_tokenと合わないため、繰り返しまたは切り捨てが発生します: length {len(init_token_ids)}")
  else:
-    init_token_id = None
+    init_token_ids = None

  # add new word to tokenizer, count is num_vectors_per_token
  token_strings = [args.token_string] + [f"{args.token_string}{i+1}" for i in range(args.num_vectors_per_token - 1)]
@@ -120,9 +120,9 @@ def train(args):

  # Initialise the newly added placeholder token with the embeddings of the initializer token
  token_embeds = text_encoder.get_input_embeddings().weight.data
-  if init_token_id is not None:
-    for token_id in token_ids:
-      token_embeds[token_id] = token_embeds[init_token_id]
+  if init_token_ids is not None:
+    for i, token_id in enumerate(token_ids):
+      token_embeds[token_id] = token_embeds[init_token_ids[i % len(init_token_ids)]]
      # print(token_id, token_embeds[token_id].mean(), token_embeds[token_id].min())

  # load weights
@@ -492,7 +492,7 @@ if __name__ == '__main__':
  parser.add_argument("--token_string", type=str, default=None,
                      help="token string used in training, must not exist in tokenizer / 学習時に使用されるトークン文字列、tokenizerに存在しない文字であること")
  parser.add_argument("--init_word", type=str, default=None,
-                      help="word to initialize vector / ベクトルを初期化に使用する単語、tokenizerで一語になること")
+                      help="words to initialize vector / ベクトルを初期化に使用する単語、複数可")
  parser.add_argument("--use_object_template", action='store_true',
                      help="ignore caption and use default templates for object / キャプションは使わずデフォルトの物体用テンプレートで学習する")
  parser.add_argument("--use_style_template", action='store_true',