Diffusion Models as Dataset Distillation Priors

DAP treats pretrained diffusion models as a source of diversity, generalization, and representativeness priors, enabling training-free guidance for higher-quality distilled datasets.

Duo Su¹, Huyu Wu², Huanran Chen¹, Yiming Shi³, Yuzhu Wang⁴, Xi Ye¹, Jun Zhu¹

¹Tsinghua University ²Institute of Computing Technology, CAS ³University of Electronic Science and Technology of China ⁴South China University of Technology

Overview

Dataset distillation aims to compress a large dataset into a much smaller synthetic one that still supports strong downstream training. Recent generative methods use diffusion models as powerful foundations, but mostly rely on their sampling quality alone. DAP asks a simple question: can diffusion models provide better priors for dataset distillation than we currently use?

Our answer is yes. DAP interprets pretrained diffusion models as carrying three useful priors: diversity, generalization, and an often overlooked representativeness prior. The key contribution is a training-free guidance term, defined through feature-space similarity with a Mercer kernel, that nudges reverse diffusion toward distilled samples that better match the original data distribution.

Key Idea

Motivation From Section 3.1

Distilled Dataset s.t. Diversity + Generalization + Representativeness

The paper argues that an ideal distilled dataset should simultaneously preserve coverage of the original data manifold, avoid overfitting to a single evaluation architecture, and retain the most critical information from the raw dataset. DAP is designed around this trifecta rather than optimizing sample realism alone.

Formally, DAP starts from the original diffusion score and injects representativeness as an additional conditional term:

∇x log p(x|R) = ∇x log p(x) + ∇x log p(R|x)

The first term contributes diversity and generalization, while the second term brings in representativeness guidance during sampling.

1. Diversity Prior

Pretrained diffusion models naturally cover multiple modes of the data distribution, helping distilled datasets avoid collapse.

2. Generalization Prior

Diffusion-based distillation is less tied to a single surrogate classifier, which improves transfer across architectures.

3. Representativeness Prior

DAP formalizes representativeness in feature space and injects it as guidance during reverse diffusion, without retraining the generator.

In short, DAP turns diffusion models from generic generators into task-aware priors for dataset distillation. This leads to distilled samples that are not only diverse and realistic, but also more aligned with the original dataset and more robust across evaluation architectures.

Main Results

49.1

Top-1 on ImageNet-1K at IPC10

62.7

Top-1 on ImageNet-1K at IPC50

68.1

Cross-architecture Top-1 on ResNet-101 at IPC50

On ImageNet-1K, DAP achieves state-of-the-art distilled data performance with 49.1% Top-1 accuracy at IPC10 and 62.7% at IPC50. These gains come without adding extra training to the diffusion model itself.

DAP also remains strong when the evaluation backbone changes. On ImageNet-1K cross-architecture transfer, it consistently outperforms prior methods on ResNet-101, MobileNet-V2, EfficientNet-B0, and Swin Transformer, supporting the claim that the distilled data is architecture-agnostic rather than overfit to a single classifier.

Table 3: Results on ImageNette and ImageWoof

Table 3 from the paper showing results on ImageNette and ImageWoof.

Table 3 from the paper. Results are evaluated with the hard-label protocol on ImageNette and ImageWoof.

Visual Evidence

We only keep visualizations that appear in the final paper. The figure below corresponds to Figure 5 in the paper and compares real versus synthetic feature distributions under IPC50.

The t-SNE plots show that DAP aligns synthetic samples with the training distribution while maintaining generalization to test data, supporting the paper's claims about diversity and representativeness.

ImageNette-Training

ImageNette-Test

ImageWoof-Training

ImageWoof-Test

BibTeX

@inproceedings{su2026dap, title = {Diffusion Models as Dataset Distillation Priors}, author = {Su, Duo and Wu, Huyu and Chen, Huanran and Shi, Yiming and Wang, Yuzhu and Ye, Xi and Zhu, Jun}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026} }

Diffusion Models as Dataset Distillation Priors

Overview

Key Idea

Motivation From Section 3.1

1. Diversity Prior

2. Generalization Prior

3. Representativeness Prior

Main Results

Table 3: Results on ImageNette and ImageWoof

Visual Evidence

Resources

BibTeX