Refactor Preference Optimization
Refactor preference dataset
Add iterator support for ImageInfo and ImageSetInfo
- Supporting iterating through either ImageInfo or ImageSetInfo to
clean up preference dataset implementation and support 2 or more
images more cleanly without needing to duplicate code
Add tests for all PO functions
Add metrics for process_batch
Add losses for gradient manipulation of loss parts
Add normalizing gradient for stabilizing gradients
Args added:
mapo_beta = 0.05
cpo_beta = 0.1
bpo_beta = 0.1
bpo_lambda = 0.2
sdpo_beta = 0.02
simpo_gamma_beta_ratio = 0.25
simpo_beta = 2.0
simpo_smoothing = 0.0
simpo_loss_type = "sigmoid"
ddo_alpha = 4.0
ddo_beta = 0.05
* Add alpha_mask parameter and apply masked loss
* Fix type hint in trim_and_resize_if_required function
* Refactor code to use keyword arguments in train_util.py
* Fix alpha mask flipping logic
* Fix alpha mask initialization
* Fix alpha_mask transformation
* Cache alpha_mask
* Update alpha_masks to be on CPU
* Set flipped_alpha_masks to Null if option disabled
* Check if alpha_mask is None
* Set alpha_mask to None if option disabled
* Add description of alpha_mask option to docs
* Add get_my_logger()
* Use logger instead of print
* Fix log level
* Removed line-breaks for readability
* Use setup_logging()
* Add rich to requirements.txt
* Make simple
* Use logger instead of print
---------
Co-authored-by: Kohya S <52813779+kohya-ss@users.noreply.github.com>
This fixes min-snr for vpred+zsnr by dividing directly by SNR+1.
The old implementation did it in two steps: (min-snr/snr) * (snr/(snr+1)), which causes division by zero when combined with --zero_terminal_snr
* Instantiate max_norm
* minor
* Move to end of step
* argparse
* metadata
* phrasing
* Sqrt ratio and logging
* fix logging
* Dropout test
* Dropout Args
* Dropout changed to affect LoRA only
---------
Co-authored-by: Kohya S <52813779+kohya-ss@users.noreply.github.com>