Kohya-ss-sd-scripts/docs/train_ti_README-en.md

## This is a ChatGPT-4 English adaptation of the original document by kohya-ss ([train_ti_README-ja.md](https://github.com/kohya-ss/sd-scripts/blob/main/docs/train_ti_README-ja.md))

This is an explanation about learning Textual Inversion (https://textual-inversion.github.io/).

Please also refer to the [common documentation on learning](./train_README-en.md).

The implementation was greatly inspired by https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion.

The learned model can be used directly in the Web UI.

# Learning procedure

Please refer to this repository's README beforehand and set up the environment.

## Data preparation

Refer to [Preparing Training Data](./train_README-en.md) for more information.

## Executing the training

Use `train_textual_inversion.py`. The following is an example of a command-line (DreamBooth method).

```
accelerate launch --num_cpu_threads_per_process 1 train_textual_inversion.py
    --dataset_config=<.toml file created during data preparation>
    --output_dir=<output folder for the trained model>
    --output_name=<file name for the trained model output without extension>
    --save_model_as=safetensors
    --prior_loss_weight=1.0
    --max_train_steps=1600
    --learning_rate=1e-6
    --optimizer_type="AdamW8bit"
    --xformers
    --mixed_precision="fp16"
    --cache_latents
    --gradient_checkpointing
    --token_string=mychar4 --init_word=cute --num_vectors_per_token=4
```

Specify the token string during training with `--token_string`. __Make sure your training prompt includes this string (e.g., if the token_string is mychar4, use "mychar4 1girl")__. This part of the prompt will be replaced with a new token for Textual Inversion and learned. For DreamBooth and class+identifier-style datasets, it is easiest and most reliable to make the `token_string` the token string.

You can check whether the token string is included in the prompt by using `--debug_dataset`. The replaced token id will be displayed, so you can check if there are tokens after `49408`, as shown below.

```
input ids: tensor([[49406, 49408, 49409, 49410, 49411, 49412, 49413, 49414, 49415, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407]])
```

You cannot use words that the tokenizer already has (common words).

Specify the string of the source token for initializing embeddings with `--init_word`. It is better to choose something close to the concept you want to learn. You cannot specify a string that consists of two or more tokens.

Specify how many tokens to use in this training with `--num_vectors_per_token`. The more tokens you use, the more expressive the model will be, but the more tokens will be consumed. For example, if num_vectors_per_token=8, the specified token string will consume 8 tokens (out of the general prompt's 77-token limit).

These are the main options for Textual Inversion. The rest is similar to other training scripts.

Usually, it is better to specify `1` for `num_cpu_threads_per_process`.

Specify the base model for additional learning with `pretrained_model_name_or_path`. You can specify a Stable Diffusion checkpoint file (.ckpt or .safetensors), a Diffusers model directory on your local disk, or a Diffusers model ID (e.g., "stabilityai/stable-diffusion-2").

Specify the folder to save the trained model after learning with `output_dir`. Specify the model's filename without the extension in `output_name`. Specify saving the model in safetensors format with `save_model_as`.

Specify the `.toml` file in `dataset_config`. Set the batch size in the file to `1` initially to keep memory consumption low.

Set the number of training steps to 10000 with `max_train_steps`. Set the learning rate to 5e-6 with `learning_rate`.

To save memory, specify `mixed_precision="fp16"` (for RTX 30 series and later, you can also specify `bf16`. Match the setting you made in accelerate when setting up the environment). Also, specify `gradient_checkpointing`.

To use a low-memory consumption 8bit AdamW optimizer, specify `optimizer_type="AdamW8bit"`.

Specify the `xformers` option to use xformers' CrossAttention. If you have not installed xformers or if it causes errors (depending on the environment, such as when `mixed_precision="no"`), you can alternatively specify the `mem_eff_attn` option to use the memory-efficient CrossAttention (although it will be slower).

If you have enough memory, edit the `.toml` file to increase the batch size to, for example, `8` (this may speed up and potentially improve accuracy).

### Commonly used options

Please refer to the documentation on options in the following cases:

- Training a Stable Diffusion 2.x or derived model
- Training a model with a clip skip of 2 or more
- Training with captions exceeding 75 tokens

### Batch size for Textual Inversion

Compared to DreamBooth and fine-tuning, which train the entire model, Textual Inversion uses less memory, so you can set a larger batch size.

# Other main options for Textual Inversion

Please refer to another document for all options.

* `--weights`
  * Load pre-trained embeddings before training and learn further from them.
* `--use_object_template`
  * Learn with a default object template string (e.g., "a photo of a {}") instead of captions. This will be the same as the official implementation. Captions will be ignored.
* `--use_style_template`
  * Learn with a default style template string (e.g., "a painting in the style of {}") instead of captions. This will be the same as the official implementation. Captions will be ignored.

## Generating images with the script in this repository

Specify the learned embeddings file with the `--textual_inversion_embeddings` option in gen_img_diffusers.py (multiple files allowed). Use the filename (without the extension) of the embeddings file in the prompt, and the embeddings will be applied.