Fix fp16 mixed precision, model is in bf16 without full_bf16

This commit is contained in:
Kohya S
2024-06-29 17:21:25 +09:00
parent 66cf435479
commit 19086465e8
4 changed files with 61 additions and 15 deletions

View File

@@ -4,21 +4,28 @@ This repository contains training, generation and utility scripts for Stable Dif
SD3 training is done with `sd3_train.py`.
__Jun 29, 2024__: Fixed mixed precision training with fp16 is not working. Fixed the model is in bf16 dtype even without `--full_bf16` option (this could worsen the training result).
`fp16` and `bf16` are available for mixed precision training. We are not sure which is better.
`optimizer_type = "adafactor"` is recommended for 24GB VRAM GPUs. `cache_text_encoder_outputs_to_disk` and `cache_latents_to_disk` are necessary currently.
`clip_l`, `clip_g` and `t5xxl` can be specified if the checkpoint does not include them.
t5xxl doesn't seem to work with `fp16`, so use`bf16` or `fp32`.
t5xxl doesn't seem to work with `fp16`, so 1) use`bf16` for mixed precision, or 2) use `bf16` or `float32` for `t5xxl_dtype`.
There are `t5xxl_device` and `t5xxl_dtype` options for `t5xxl` device and dtype.
`text_encoder_batch_size` is added experimentally for caching faster.
```toml
learning_rate = 1e-5 # seems to be too high
learning_rate = 1e-6 # seems to depend on the batch size
optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
cache_text_encoder_outputs = true
cache_text_encoder_outputs_to_disk = true
vae_batch_size = 1
text_encoder_batch_size = 4
cache_latents = true
cache_latents_to_disk = true
```