Skip to content

Commit bf4a9f1

Browse files
committed
update week 15
1 parent 3e012a0 commit bf4a9f1

File tree

8 files changed

+1005
-146
lines changed

8 files changed

+1005
-146
lines changed

doc/pub/week15/html/week15-bs.html

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,23 @@
4242
'plans-for-the-week-of-may-5-9-2025'),
4343
('Readings', 2, None, 'readings'),
4444
('Diffusion models, basics', 2, None, 'diffusion-models-basics'),
45+
('Why diffusion models?', 2, None, 'why-diffusion-models'),
46+
('What are diffusion models?',
47+
2,
48+
None,
49+
'what-are-diffusion-models'),
4550
('Problems with probabilistic models',
4651
2,
4752
None,
4853
'problems-with-probabilistic-models'),
4954
('Diffusion models', 2, None, 'diffusion-models'),
5055
('Original idea', 2, None, 'original-idea'),
5156
('Diffusion learning', 2, None, 'diffusion-learning'),
57+
('How diffusion models work',
58+
2,
59+
None,
60+
'how-diffusion-models-work'),
61+
('Data preprocessing', 2, None, 'data-preprocessing'),
5262
('Mathematics of diffusion models',
5363
2,
5464
None,
@@ -101,6 +111,21 @@
101111
None,
102112
'diffusion-models-part-2-from-url-https-arxiv-org-abs-2208-11970'),
103113
('Optimization cost', 2, None, 'optimization-cost'),
114+
('Image quality', 2, None, 'image-quality'),
115+
('Training stability', 2, None, 'training-stability'),
116+
('Input types', 2, None, 'input-types'),
117+
('Denoising diffusion probabilistic models (DDPMs)',
118+
2,
119+
None,
120+
'denoising-diffusion-probabilistic-models-ddpms'),
121+
('Techniques for speeding up diffusion models',
122+
2,
123+
None,
124+
'techniques-for-speeding-up-diffusion-models'),
125+
('Applications of diffusion models',
126+
2,
127+
None,
128+
'applications-of-diffusion-models'),
104129
('PyTorch implementation of a Denoising Diffusion Probabilistic '
105130
'Model (DDPM) trained on the MNIST dataset',
106131
2,
@@ -165,10 +190,14 @@
165190
<!-- navigation toc: --> <li><a href="#plans-for-the-week-of-may-5-9-2025" style="font-size: 80%;">Plans for the week of May 5-9, 2025</a></li>
166191
<!-- navigation toc: --> <li><a href="#readings" style="font-size: 80%;">Readings</a></li>
167192
<!-- navigation toc: --> <li><a href="#diffusion-models-basics" style="font-size: 80%;">Diffusion models, basics</a></li>
193+
<!-- navigation toc: --> <li><a href="#why-diffusion-models" style="font-size: 80%;">Why diffusion models?</a></li>
194+
<!-- navigation toc: --> <li><a href="#what-are-diffusion-models" style="font-size: 80%;">What are diffusion models?</a></li>
168195
<!-- navigation toc: --> <li><a href="#problems-with-probabilistic-models" style="font-size: 80%;">Problems with probabilistic models</a></li>
169196
<!-- navigation toc: --> <li><a href="#diffusion-models" style="font-size: 80%;">Diffusion models</a></li>
170197
<!-- navigation toc: --> <li><a href="#original-idea" style="font-size: 80%;">Original idea</a></li>
171198
<!-- navigation toc: --> <li><a href="#diffusion-learning" style="font-size: 80%;">Diffusion learning</a></li>
199+
<!-- navigation toc: --> <li><a href="#how-diffusion-models-work" style="font-size: 80%;">How diffusion models work</a></li>
200+
<!-- navigation toc: --> <li><a href="#data-preprocessing" style="font-size: 80%;">Data preprocessing</a></li>
172201
<!-- navigation toc: --> <li><a href="#mathematics-of-diffusion-models" style="font-size: 80%;">Mathematics of diffusion models</a></li>
173202
<!-- navigation toc: --> <li><a href="#chains-of-vaes" style="font-size: 80%;">Chains of VAEs</a></li>
174203
<!-- navigation toc: --> <li><a href="#mathematical-representation" style="font-size: 80%;">Mathematical representation</a></li>
@@ -189,6 +218,12 @@
189218
<!-- navigation toc: --> <li><a href="#the-last-term" style="font-size: 80%;">The last term</a></li>
190219
<!-- navigation toc: --> <li><a href="#diffusion-models-part-2-from-url-https-arxiv-org-abs-2208-11970" style="font-size: 80%;">Diffusion models, part 2, from URL:"https://arxiv.org/abs/2208.11970"</a></li>
191220
<!-- navigation toc: --> <li><a href="#optimization-cost" style="font-size: 80%;">Optimization cost</a></li>
221+
<!-- navigation toc: --> <li><a href="#image-quality" style="font-size: 80%;">Image quality</a></li>
222+
<!-- navigation toc: --> <li><a href="#training-stability" style="font-size: 80%;">Training stability</a></li>
223+
<!-- navigation toc: --> <li><a href="#input-types" style="font-size: 80%;">Input types</a></li>
224+
<!-- navigation toc: --> <li><a href="#denoising-diffusion-probabilistic-models-ddpms" style="font-size: 80%;">Denoising diffusion probabilistic models (DDPMs)</a></li>
225+
<!-- navigation toc: --> <li><a href="#techniques-for-speeding-up-diffusion-models" style="font-size: 80%;">Techniques for speeding up diffusion models</a></li>
226+
<!-- navigation toc: --> <li><a href="#applications-of-diffusion-models" style="font-size: 80%;">Applications of diffusion models</a></li>
192227
<!-- navigation toc: --> <li><a href="#pytorch-implementation-of-a-denoising-diffusion-probabilistic-model-ddpm-trained-on-the-mnist-dataset" style="font-size: 80%;">PyTorch implementation of a Denoising Diffusion Probabilistic Model (DDPM) trained on the MNIST dataset</a></li>
193228
<!-- navigation toc: --> <li><a href="#problem-with-diffusion-models" style="font-size: 80%;">Problem with diffusion models</a></li>
194229
<!-- navigation toc: --> <li><a href="#imports-and-utilities" style="font-size: 80%;">Imports and Utilities</a></li>
@@ -255,6 +290,7 @@ <h2 id="readings" class="anchor">Readings </h2>
255290
<li> A central paper is the one by Sohl-Dickstein et al, Deep Unsupervised Learning using Nonequilibrium Thermodynamics, <a href="https://arxiv.org/abs/1503.03585" target="_self"><tt>https://arxiv.org/abs/1503.03585</tt></a></li>
256291
<li> Calvin Luo at <a href="https://arxiv.org/abs/2208.11970" target="_self"><tt>https://arxiv.org/abs/2208.11970</tt></a></li>
257292
<li> See also Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho, Variational Diffusion Models, <a href="https://arxiv.org/abs/2107.00630" target="_self"><tt>https://arxiv.org/abs/2107.00630</tt></a></li>
293+
<li> See also David Foster <em>Generative Deep Learning</em>, chapter 8 on diffusion models.</li>
258294
</ol>
259295
</div>
260296
</div>
@@ -271,6 +307,40 @@ <h2 id="diffusion-models-basics" class="anchor">Diffusion models, basics </h2>
271307
variable has high dimensionality (same as the original data).
272308
</p>
273309

310+
<!-- !split -->
311+
<h2 id="why-diffusion-models" class="anchor">Why diffusion models? </h2>
312+
313+
<p>Diffusion models are prominent in generating high-quality images,
314+
video, sound, etc. They are named for their similarity to the natural
315+
diffusion process in physics, which describes how molecules move from
316+
high-concentration to low-concentration areas. In the context of
317+
machine learning, diffusion models generate new data by reversing a
318+
diffusion process, that is information loss due to noise
319+
intervention. The main idea here is to add random noise to data and
320+
then undo the process to get the original data distribution from the
321+
noisy data.
322+
</p>
323+
324+
<p>The famous DALL-E 2, Midjourney, and open-source Stable Diffusion that
325+
create realistic images based on the user's text input are all
326+
examples of diffusion models.
327+
</p>
328+
329+
<!-- !split -->
330+
<h2 id="what-are-diffusion-models" class="anchor">What are diffusion models? </h2>
331+
332+
<p>Diffusion models are advanced machine learning algorithms that
333+
uniquely generate high-quality data by progressively adding noise to a
334+
dataset and then learning to reverse this process. This innovative
335+
approach enables them to create remarkably accurate and detailed
336+
outputs, from lifelike images to coherent text sequences. Central to
337+
their function is the concept of gradually degrading data quality,
338+
only to reconstruct it to its original form or transform it into
339+
something new. This technique enhances the fidelity of generated data
340+
and offers new possibilities in areas like medical imaging, autonomous
341+
vehicles, and personalized AI assistants.
342+
</p>
343+
274344
<!-- !split -->
275345
<h2 id="problems-with-probabilistic-models" class="anchor">Problems with probabilistic models </h2>
276346

@@ -326,6 +396,32 @@ <h2 id="diffusion-learning" class="anchor">Diffusion learning </h2>
326396
of arbitrary form.
327397
</p>
328398

399+
<!-- !split -->
400+
<h2 id="how-diffusion-models-work" class="anchor">How diffusion models work </h2>
401+
402+
<p>Diffusion models work in a dual-phase mechanism: They first train a
403+
neural network to introduce noise into the dataset(a staple in the
404+
forward diffusion process) and then methodically reverse this
405+
process.
406+
</p>
407+
408+
<!-- !split -->
409+
<h2 id="data-preprocessing" class="anchor">Data preprocessing </h2>
410+
411+
<p>Before the diffusion process begins, data needs to be appropriately
412+
formatted for model training. This process involves data cleaning to
413+
remove outliers, data normalization to scale features consistently,
414+
and data augmentation to increase dataset diversity, especially in the
415+
case of image data. Standardization is also applied to achieve normal
416+
data distribution, which is important for handling noisy image
417+
data. Different data types, such as text or images, may require
418+
specific preprocessing steps, like addressing class-imbalance
419+
issues. Well-executed data processing ensures high-quality training
420+
data and contributes to the model's ability to learn meaningful
421+
patterns and generate high-quality images (or other data types) during
422+
inference.
423+
</p>
424+
329425
<!-- !split -->
330426
<h2 id="mathematics-of-diffusion-models" class="anchor">Mathematics of diffusion models </h2>
331427

@@ -606,6 +702,76 @@ <h2 id="optimization-cost" class="anchor">Optimization cost </h2>
606702
value may have high variance for large \( T \) values.
607703
</p>
608704

705+
<!-- !split -->
706+
<h2 id="image-quality" class="anchor">Image quality </h2>
707+
708+
<p>An advantage of diffusion models over for example VAEs (and also GANs
709+
to be discussed next time) is the ease of training with simple and
710+
efficient loss functions and their ability to generate highly
711+
realistic images. They excel at closely matching the distribution of
712+
real images, outperforming GANs in this aspect. This proficiency is
713+
due to the distinct mechanisms in diffusion models, allowing for more
714+
precise replication of real-world imagery.
715+
</p>
716+
717+
<!-- !split -->
718+
<h2 id="training-stability" class="anchor">Training stability </h2>
719+
720+
<p>Regarding training stability, generative diffusion models have an edge
721+
over GANs. GANs often struggle with <em>mode collapse</em>, which is a limitation
722+
where they produce a limited output variety. Diffusion models
723+
effectively avoid this issue through their gradual data smoothing
724+
process, leading to a more diverse range of generated images.
725+
</p>
726+
727+
<!-- !split -->
728+
<h2 id="input-types" class="anchor">Input types </h2>
729+
730+
<p>It is also important to mention that diffusion models handle various
731+
input types. They perform diverse generative tasks like text-to-image
732+
synthesis, layout-to-image generation, inpainting, and
733+
super-resolution tasks.
734+
</p>
735+
736+
<!-- !split -->
737+
<h2 id="denoising-diffusion-probabilistic-models-ddpms" class="anchor">Denoising diffusion probabilistic models (DDPMs) </h2>
738+
739+
<p>Denoising diffusion probabilistic models (DDPMs) are a specific type
740+
of diffusion model that focuses on probabilistically removing noise
741+
from data. During training, they learn how noise is added to data over
742+
time and how to reverse this process to recover the original
743+
data. This involves using probabilities to make educated guesses about
744+
what the data looked like before noise was added. This approach is
745+
essential for the model's capability to accurately reconstruct data,
746+
ensuring the outputs aren&#8217;t just noise-free but also closely resemble
747+
the original data.
748+
</p>
749+
750+
<!-- !split -->
751+
<h2 id="techniques-for-speeding-up-diffusion-models" class="anchor">Techniques for speeding up diffusion models </h2>
752+
753+
<p>Generating a sample from DDPM using the reverse diffusion process is
754+
quite slow because it involves many steps, possibly up to a
755+
thousand. For instance, according to Song et al. (2020), it takes
756+
about 20 hours to generate 50,000 small images with a DDPM, while a
757+
GAN can create the same amount in less than a minute using an Nvidia
758+
2080 Ti GPU.
759+
</p>
760+
761+
<p>There is an alternative method called Denoising Diffusion Implicit
762+
Model (DDIM) that stands out for its efficiency and quality. Unlike
763+
traditional models, DDIM needs fewer steps to create clear images from
764+
noisy data.
765+
</p>
766+
767+
<!-- !split -->
768+
<h2 id="applications-of-diffusion-models" class="anchor">Applications of diffusion models </h2>
769+
770+
<p>There are very diverse applications of diffusion models, one of the most exciting being digital art creation.
771+
The document at <a href="https://www.superannotate.com/blog/diffusion-models#:~:text=A%20primary%20advantage%20of%20diffusion,to%20generate%20highly%20realistic%20images" target="_self"><tt>https://www.superannotate.com/blog/diffusion-models#:~:text=A%20primary%20advantage%20of%20diffusion,to%20generate%20highly%20realistic%20images</tt></a> gives many nice examples of applications.
772+
.
773+
</p>
774+
609775
<!-- !split -->
610776
<h2 id="pytorch-implementation-of-a-denoising-diffusion-probabilistic-model-ddpm-trained-on-the-mnist-dataset" class="anchor">PyTorch implementation of a Denoising Diffusion Probabilistic Model (DDPM) trained on the MNIST dataset </h2>
611777

0 commit comments

Comments
 (0)