Fix fp16 mixed precision, model is in bf16 without full_bf16

2026-04-09 06:45:09 +00:00 · 2024-06-29 17:21:25 +09:00
parent 66cf435479
commit 19086465e8
4 changed files with 61 additions and 15 deletions
--- a/README.md
+++ b/README.md
@@ -4,21 +4,28 @@ This repository contains training, generation and utility scripts for Stable Dif

 SD3 training is done with `sd3_train.py`. 

+__Jun 29, 2024__: Fixed mixed precision training with fp16 is not working. Fixed the model is in bf16 dtype even without `--full_bf16` option (this could worsen the training result).
+
+`fp16` and `bf16` are available for mixed precision training. We are not sure which is better.
+
 `optimizer_type = "adafactor"` is recommended for 24GB VRAM GPUs. `cache_text_encoder_outputs_to_disk` and `cache_latents_to_disk` are necessary currently. 

 `clip_l`, `clip_g` and `t5xxl` can be specified if the checkpoint does not include them.  

-t5xxl doesn't seem to work with `fp16`, so use`bf16` or `fp32`. 
+t5xxl doesn't seem to work with `fp16`, so 1) use`bf16` for mixed precision, or 2) use `bf16` or `float32` for `t5xxl_dtype`. 

 There are `t5xxl_device` and `t5xxl_dtype` options for `t5xxl` device and dtype. 

+`text_encoder_batch_size` is added experimentally for caching faster.
+
 ```toml
-learning_rate = 1e-5 # seems to be too high
+learning_rate = 1e-6 # seems to depend on the batch size
 optimizer_type = "adafactor"
 optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
 cache_text_encoder_outputs = true
 cache_text_encoder_outputs_to_disk = true
 vae_batch_size = 1
+text_encoder_batch_size = 4
 cache_latents = true
 cache_latents_to_disk = true
 ```