Commit Graph

982 Commits

Author SHA1 Message Date
rockerBOO
7a08c52aa4 Add error if with CDC if cache_latents or cache_latents_to_disk is not set 2025-11-03 21:47:15 -05:00
rockerBOO
377299851a Fix cdc cache file validation 2025-11-02 23:22:10 -05:00
rockerBOO
b4e5d09871 Fix multi-resolution support in cached files 2025-10-30 23:27:13 -04:00
rockerBOO
0dfafb4fff Remove deprecated cdc cache path 2025-10-18 17:59:12 -04:00
rockerBOO
83c17de61f Remove faiss, save per image cdc file 2025-10-18 14:07:55 -04:00
rockerBOO
8089cb6925 Improve dimension mismatch warning for CDC Flow Matching
- Add explicit warning and tracking for multiple unique latent shapes
- Simplify test imports by removing unused modules
- Minor formatting improvements in print statements
- Ensure log messages provide clear context about dimension mismatches
2025-10-11 17:17:09 -04:00
rockerBOO
aa3a216106 Slight cleanup 2025-10-11 16:15:35 -04:00
rockerBOO
8458a5696e Add graceful fallback when FAISS is not installed
- Make FAISS import optional with try/except
- CDCPreprocessor raises helpful ImportError if FAISS unavailable
- train_util.py catches ImportError and returns None
- train_network.py checks for None and warns user
- Training continues without CDC-FM if FAISS not installed
- Remove benchmark file (not needed in repo)

This allows users to run training without FAISS dependency.
CDC-FM will be automatically disabled with a warning if FAISS is missing.
2025-10-09 23:50:07 -04:00
rockerBOO
7ca799ca26 Add adaptive k_neighbors support for CDC-FM
- Add --cdc_adaptive_k flag to enable adaptive k based on bucket size
- Add --cdc_min_bucket_size to set minimum bucket threshold (default: 16)
- Fixed mode (default): Skip buckets with < k_neighbors samples
- Adaptive mode: Use k=min(k_neighbors, bucket_size-1) for buckets >= min_bucket_size
- Update CDCPreprocessor to support adaptive k per bucket
- Add metadata tracking for adaptive_k and min_bucket_size
- Add comprehensive pytest tests for adaptive k behavior

This allows CDC-FM to work effectively with multi-resolution bucketing where
bucket sizes may vary widely. Users can choose between strict paper methodology
(fixed k) or pragmatic approach (adaptive k).
2025-10-09 23:16:44 -04:00
rockerBOO
f128f5a645 Formatting cleanup 2025-10-09 18:28:51 -04:00
rockerBOO
c8a4e99074 Add --cdc_debug flag and tqdm progress for CDC preprocessing
- Add --cdc_debug flag to enable verbose bucket-by-bucket output
- When debug=False (default): Show tqdm progress bar, concise logging
- When debug=True: Show detailed bucket information, no progress bar
- Improves user experience during CDC cache generation
2025-10-09 18:28:51 -04:00
rockerBOO
7a7110cdc6 Use logger instead of print for CDC loading messages 2025-10-09 18:28:51 -04:00
rockerBOO
1d4c4d4cb2 Fix: Replace CDC integer index lookup with image_key strings
Fixes shape mismatch bug in multi-subset training where CDC preprocessing
and training used different index calculations, causing wrong CDC data to
be loaded for samples.

Changes:
- CDC cache now stores/loads data using image_key strings instead of integer indices
- Training passes image_key list instead of computed integer indices
- All CDC lookups use stable image_key identifiers
- Improved device compatibility check (handles "cuda" vs "cuda:0")
- Updated all 30 CDC tests to use image_key-based access

Root cause: Preprocessing used cumulative dataset indices while training
used sorted keys, resulting in mismatched lookups during shuffled multi-subset
training.
2025-10-09 18:28:51 -04:00
rockerBOO
4bea582601 Fix: Prevent false device mismatch warnings for cuda vs cuda:0
- Treat cuda and cuda:0 as compatible devices
- Only warn on actual device mismatches (cuda vs cpu)
- Eliminates warning spam during multi-subset training
2025-10-09 18:28:51 -04:00
rockerBOO
ee8ceee178 Add device consistency validation for CDC transformation
- Check that noise and CDC matrices are on same device
- Automatically transfer noise if device mismatch detected
- Warn user when device transfer occurs
- Add tests to verify device handling
2025-10-09 18:28:51 -04:00
rockerBOO
ce17007e1a Add warning throttling for CDC shape mismatches
- Track warned samples in global set to prevent log spam
- Each sample only warned once per training session
- Prevents thousands of duplicate warnings during training
- Add tests to verify throttling behavior
2025-10-09 18:28:50 -04:00
rockerBOO
88af20881d Fix: Enable gradient flow through CDC noise transformation
- Remove @torch.no_grad() decorator from compute_sigma_t_x()
- Gradients now properly flow through CDC transformation during training
- Add comprehensive gradient flow tests for fast/slow paths and fallback
- All 25 CDC tests passing
2025-10-09 18:28:50 -04:00
rockerBOO
0d822b2f74 Refactor: Extract CDC noise transformation to separate function
- Create apply_cdc_noise_transformation() for better modularity
- Implement fast path for batch processing when all shapes match
- Implement slow path for per-sample processing on shape mismatch
- Clone noise tensors in fallback path for gradient consistency
2025-10-09 18:28:50 -04:00
rockerBOO
e03200bdba Optimize: Cache CDC shapes in memory to eliminate I/O bottleneck
- Cache all shapes during GammaBDataset initialization
- Eliminates file I/O on every training step (9.5M accesses/sec)
- Reduces get_shape() from file operation to dict lookup
- Memory overhead: ~126 bytes/sample (~12.6 MB per 100k images)
2025-10-09 18:28:50 -04:00
rockerBOO
f552f9a3bd Add CDC-FM (Carré du Champ Flow Matching) support
Implements geometry-aware noise generation for FLUX training based on
arXiv:2510.05930v1.
2025-10-09 18:28:47 -04:00
Kohya S
5462a6bb24 Merge branch 'dev' into sd3 2025-09-29 21:02:02 +09:00
Kohya S
63711390a0 Merge branch 'main' into dev 2025-09-29 20:56:07 +09:00
Kohya S
60bfa97b19 fix: disable_mmap_safetensors not defined in SDXL TI training 2025-09-29 20:52:48 +09:00
Kohya S.
e7b89826c5 Update library/custom_offloading_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-21 13:29:58 +09:00
Kohya S
806d535ef1 fix: block-wise scaling is overwritten by per-tensor scaling 2025-09-21 13:10:41 +09:00
Kohya S
3876343fad fix: remove print statement for guidance rescale in AdaptiveProjectedGuidance 2025-09-21 13:09:38 +09:00
Kohya S
040d976597 feat: add guidance rescale options for Adaptive Projected Guidance in inference 2025-09-21 13:03:14 +09:00
Kohya S
9621d9d637 feat: add Adaptive Projected Guidance parameters and noise rescaling 2025-09-21 12:34:40 +09:00
Kohya S
f41e9e2b58 feat: add vae_chunk_size argument for memory-efficient VAE decoding and processing 2025-09-21 11:09:37 +09:00
Kohya S
b090d15f7d feat: add multi backend attention and related update for HI2.1 models and scripts 2025-09-20 19:45:33 +09:00
Kohya S
f834b2e0d4 fix: --fp8_vl to work 2025-09-18 23:46:18 +09:00
Kohya S
f6b4bdc83f feat: block-wise fp8 quantization 2025-09-18 21:20:54 +09:00
Kohya S
f5b004009e fix: correct tensor indexing in HunyuanVAE2D class for blending and encoding functions 2025-09-17 21:54:25 +09:00
Kohya S
4e2a80a6ca refactor: update imports to use safetensors_utils for memory-efficient operations 2025-09-13 21:07:11 +09:00
Kohya S
d831c88832 fix: sample generation doesn't work with block swap 2025-09-13 21:06:04 +09:00
Kohya S
bae7fa74eb Merge branch 'sd3' into feat-hunyuan-image-2.1-inference 2025-09-13 20:13:58 +09:00
Kohya S.
e1c666e97f Update library/safetensors_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-13 20:03:55 +09:00
Kohya S
8783f8aed3 feat: faster safetensors load and split safetensor utils 2025-09-13 19:51:38 +09:00
Kohya S
209c02dbb6 feat: HunyuanImage LoRA training 2025-09-12 21:40:42 +09:00
Kohya S
a0f0afbb46 fix: revert constructor signature update 2025-09-11 22:27:00 +09:00
Kohya S
7f983c558d feat: block swap for inference and initial impl for HunyuanImage LoRA (not working) 2025-09-11 22:15:22 +09:00
Kohya S
5149be5a87 feat: initial commit for HunyuanImage-2.1 inference 2025-09-11 12:54:12 +09:00
Kohya S
e836b7f66d fix: chroma LoRA training without Text Encode caching 2025-08-30 09:30:24 +09:00
Kohya S
6edbe00547 feat: update libraries, remove warnings 2025-08-16 20:07:03 +09:00
Kohya S
351bed965c fix model type handling in analyze_state_dict_state function for SD3 2025-08-13 21:38:51 +09:00
rockerBOO
9bb50c26c4 Set sai_model_spec to must 2025-08-03 00:43:09 -04:00
rockerBOO
10bfcb9ac5 Remove text model spec 2025-08-03 00:40:10 -04:00
rockerBOO
d24d733892 Update model spec to 1.0.1. Refactor model spec 2025-08-02 21:14:27 -04:00
Kohya S
96feb61c0a feat: implement modulation vector extraction for Chroma and update related methods 2025-07-30 21:34:49 +09:00
Kohya S
6c8973c2da doc: add reference link for input vector gradient requirement in Chroma class 2025-07-28 22:08:02 +09:00