Commit Graph

7 Commits

Author SHA1 Message Date
rockerBOO
f128f5a645 Formatting cleanup 2025-10-09 18:28:51 -04:00
rockerBOO
c8a4e99074 Add --cdc_debug flag and tqdm progress for CDC preprocessing
- Add --cdc_debug flag to enable verbose bucket-by-bucket output
- When debug=False (default): Show tqdm progress bar, concise logging
- When debug=True: Show detailed bucket information, no progress bar
- Improves user experience during CDC cache generation
2025-10-09 18:28:51 -04:00
rockerBOO
7a7110cdc6 Use logger instead of print for CDC loading messages 2025-10-09 18:28:51 -04:00
rockerBOO
1d4c4d4cb2 Fix: Replace CDC integer index lookup with image_key strings
Fixes shape mismatch bug in multi-subset training where CDC preprocessing
and training used different index calculations, causing wrong CDC data to
be loaded for samples.

Changes:
- CDC cache now stores/loads data using image_key strings instead of integer indices
- Training passes image_key list instead of computed integer indices
- All CDC lookups use stable image_key identifiers
- Improved device compatibility check (handles "cuda" vs "cuda:0")
- Updated all 30 CDC tests to use image_key-based access

Root cause: Preprocessing used cumulative dataset indices while training
used sorted keys, resulting in mismatched lookups during shuffled multi-subset
training.
2025-10-09 18:28:51 -04:00
rockerBOO
88af20881d Fix: Enable gradient flow through CDC noise transformation
- Remove @torch.no_grad() decorator from compute_sigma_t_x()
- Gradients now properly flow through CDC transformation during training
- Add comprehensive gradient flow tests for fast/slow paths and fallback
- All 25 CDC tests passing
2025-10-09 18:28:50 -04:00
rockerBOO
e03200bdba Optimize: Cache CDC shapes in memory to eliminate I/O bottleneck
- Cache all shapes during GammaBDataset initialization
- Eliminates file I/O on every training step (9.5M accesses/sec)
- Reduces get_shape() from file operation to dict lookup
- Memory overhead: ~126 bytes/sample (~12.6 MB per 100k images)
2025-10-09 18:28:50 -04:00
rockerBOO
f552f9a3bd Add CDC-FM (Carré du Champ Flow Matching) support
Implements geometry-aware noise generation for FLUX training based on
arXiv:2510.05930v1.
2025-10-09 18:28:47 -04:00