mirror of
https://github.com/kohya-ss/sd-scripts.git
synced 2026-04-09 06:45:09 +00:00
Fix fp16 mixed precision, model is in bf16 without full_bf16
This commit is contained in:
11
README.md
11
README.md
@@ -4,21 +4,28 @@ This repository contains training, generation and utility scripts for Stable Dif
|
||||
|
||||
SD3 training is done with `sd3_train.py`.
|
||||
|
||||
__Jun 29, 2024__: Fixed mixed precision training with fp16 is not working. Fixed the model is in bf16 dtype even without `--full_bf16` option (this could worsen the training result).
|
||||
|
||||
`fp16` and `bf16` are available for mixed precision training. We are not sure which is better.
|
||||
|
||||
`optimizer_type = "adafactor"` is recommended for 24GB VRAM GPUs. `cache_text_encoder_outputs_to_disk` and `cache_latents_to_disk` are necessary currently.
|
||||
|
||||
`clip_l`, `clip_g` and `t5xxl` can be specified if the checkpoint does not include them.
|
||||
|
||||
t5xxl doesn't seem to work with `fp16`, so use`bf16` or `fp32`.
|
||||
t5xxl doesn't seem to work with `fp16`, so 1) use`bf16` for mixed precision, or 2) use `bf16` or `float32` for `t5xxl_dtype`.
|
||||
|
||||
There are `t5xxl_device` and `t5xxl_dtype` options for `t5xxl` device and dtype.
|
||||
|
||||
`text_encoder_batch_size` is added experimentally for caching faster.
|
||||
|
||||
```toml
|
||||
learning_rate = 1e-5 # seems to be too high
|
||||
learning_rate = 1e-6 # seems to depend on the batch size
|
||||
optimizer_type = "adafactor"
|
||||
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
|
||||
cache_text_encoder_outputs = true
|
||||
cache_text_encoder_outputs_to_disk = true
|
||||
vae_batch_size = 1
|
||||
text_encoder_batch_size = 4
|
||||
cache_latents = true
|
||||
cache_latents_to_disk = true
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user