Back to Basics: Let Denoising Generative Models Denoise

Li, Tianhong; He, Kaiming

Computer Science > Computer Vision and Pattern Recognition

arXiv:2511.13720 (cs)

[Submitted on 17 Nov 2025]

Title:Back to Basics: Let Denoising Generative Models Denoise

Authors:Tianhong Li, Kaiming He

View PDF HTML (experimental)

Abstract:Today's denoising diffusion models do not "denoise" in the classical sense, i.e., they do not directly predict clean images. Rather, the neural networks predict noise or a noised quantity. In this paper, we suggest that predicting clean data and predicting noised quantities are fundamentally different. According to the manifold assumption, natural data should lie on a low-dimensional manifold, whereas noised quantities do not. With this assumption, we advocate for models that directly predict clean data, which allows apparently under-capacity networks to operate effectively in very high-dimensional spaces. We show that simple, large-patch Transformers on pixels can be strong generative models: using no tokenizer, no pre-training, and no extra loss. Our approach is conceptually nothing more than "$\textbf{Just image Transformers}$", or $\textbf{JiT}$, as we call it. We report competitive results using JiT with large patch sizes of 16 and 32 on ImageNet at resolutions of 256 and 512, where predicting high-dimensional noised quantities can fail catastrophically. With our networks mapping back to the basics of the manifold, our research goes back to basics and pursues a self-contained paradigm for Transformer-based diffusion on raw natural data.

Comments:	Tech report. Code at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2511.13720 [cs.CV]
	(or arXiv:2511.13720v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2511.13720

Submission history

From: Tianhong Li [view email]
[v1] Mon, 17 Nov 2025 18:59:57 UTC (42,916 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Back to Basics: Let Denoising Generative Models Denoise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Back to Basics: Let Denoising Generative Models Denoise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators