- Make FAISS import optional with try/except
- CDCPreprocessor raises helpful ImportError if FAISS unavailable
- train_util.py catches ImportError and returns None
- train_network.py checks for None and warns user
- Training continues without CDC-FM if FAISS not installed
- Remove benchmark file (not needed in repo)
This allows users to run training without FAISS dependency.
CDC-FM will be automatically disabled with a warning if FAISS is missing.
- Add --cdc_adaptive_k flag to enable adaptive k based on bucket size
- Add --cdc_min_bucket_size to set minimum bucket threshold (default: 16)
- Fixed mode (default): Skip buckets with < k_neighbors samples
- Adaptive mode: Use k=min(k_neighbors, bucket_size-1) for buckets >= min_bucket_size
- Update CDCPreprocessor to support adaptive k per bucket
- Add metadata tracking for adaptive_k and min_bucket_size
- Add comprehensive pytest tests for adaptive k behavior
This allows CDC-FM to work effectively with multi-resolution bucketing where
bucket sizes may vary widely. Users can choose between strict paper methodology
(fixed k) or pragmatic approach (adaptive k).
- Add --cdc_debug flag to enable verbose bucket-by-bucket output
- When debug=False (default): Show tqdm progress bar, concise logging
- When debug=True: Show detailed bucket information, no progress bar
- Improves user experience during CDC cache generation
Fixes shape mismatch bug in multi-subset training where CDC preprocessing
and training used different index calculations, causing wrong CDC data to
be loaded for samples.
Changes:
- CDC cache now stores/loads data using image_key strings instead of integer indices
- Training passes image_key list instead of computed integer indices
- All CDC lookups use stable image_key identifiers
- Improved device compatibility check (handles "cuda" vs "cuda:0")
- Updated all 30 CDC tests to use image_key-based access
Root cause: Preprocessing used cumulative dataset indices while training
used sorted keys, resulting in mismatched lookups during shuffled multi-subset
training.
2、Deprecate and remove the guidance_scale parameter because it used in inference not train
3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.