You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the deep learning frameworks:
- make it more clear that these should be used for any serious deep
learning, NumPy isn't the right library beyond learning.
- Remove Swift for TensorFlow (Google abandoned it), add MXNet, and
order frameworks in the order we'd want to recommend them.
For the ethics section: this is of pretty fundamental importance, and
at least some pointers should be provided in introductory ML/AI
resources so people new to the field at least are aware it's a thing
and can explore the topic.
Copy file name to clipboardExpand all lines: content/tutorial-deep-learning-on-mnist.md
+36-30
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ kernelspec:
15
15
16
16
This tutorial demonstrates how to build a simple [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) (with one hidden layer) and train it from scratch with NumPy to recognize handwritten digit images.
17
17
18
-
Your deep learning model — one of the most basic artificial neural networks that resembles the original [multi-layer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) — will learn to classify digits from 0 to 9 from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. The dataset contains 60,000 training and 10,000 test images and corresponding labels. Each training and test image is of size 784 (or 28x28 pixels) — this will be your input for the neural network.
18
+
Your deep learning model — one of the most basic artificial neural networks that resembles the original [multi-layer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) — will learn to classify digits from 0 to 9 from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. The dataset contains 60,000 training and 10,000 test images and corresponding labels. Each training and test image is of size 784 (or 28x28 pixels) — this will be your input for the neural network.
19
19
20
20
Based on the image inputs and their labels ([supervised learning](https://en.wikipedia.org/wiki/Supervised_learning)), your neural network will be trained to learn their features using forward propagation and backpropagation ([reverse-mode](https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation) differentiation). The final output of the network is a vector of 10 scores — one for each handwritten digit image. You will also evaluate how good your model is at classifying the images on the test set.
21
21
@@ -25,13 +25,13 @@ This tutorial was adapted from the work by [Andrew Trask](https://github.com/iam
25
25
26
26
## Prerequisites
27
27
28
-
The reader should have some knowledge of Python, NumPy array manipulation, and linear algebra. In addition, you should be familiar with main concepts of [deep learning](https://en.wikipedia.org/wiki/Deep_learning).
28
+
The reader should have some knowledge of Python, NumPy array manipulation, and linear algebra. In addition, you should be familiar with main concepts of [deep learning](https://en.wikipedia.org/wiki/Deep_learning).
29
29
30
-
To refresh the memory, you can take the [Python](https://docs.python.org/dev/tutorial/index.html) and [Linear algebra on n-dimensional arrays](https://numpy.org/doc/stable/user/tutorial-svd.html) tutorials.
30
+
To refresh the memory, you can take the [Python](https://docs.python.org/dev/tutorial/index.html) and [Linear algebra on n-dimensional arrays](https://numpy.org/doc/stable/user/tutorial-svd.html) tutorials.
31
31
32
32
You are advised to read the [Deep learning](http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf) paper published in 2015 by Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, who are regarded as some of the pioneers of the field. You should also consider reading Andrew Trask's [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning), which teaches deep learning with NumPy.
33
33
34
-
In addition to NumPy, you will be utilizing the following Python standard modules for data loading and processing:
34
+
In addition to NumPy, you will be utilizing the following Python standard modules for data loading and processing:
35
35
-[`urllib`](https://docs.python.org/3/library/urllib.html) for URL handling
36
36
-[`request`](https://docs.python.org/3/library/urllib.request.html) for URL opening
37
37
-[`gzip`](https://docs.python.org/3/library/gzip.html) for gzip file decompression
@@ -167,7 +167,7 @@ for sample, ax in zip(rng.choice(x_train, size=num_examples, replace=False), axe
167
167
168
168
> **Note:** You can also visualize a sample image as an array by printing `x_train[59999]`. Here, `59999` is your 60,000th training image sample (`0` would be your first). Your output will be quite long and should contain an array of 8-bit integers:
169
169
>
170
-
>
170
+
>
171
171
> ```
172
172
> ...
173
173
> 0, 0, 38, 48, 48, 22, 0, 0, 0, 0, 0, 0, 0,
@@ -194,7 +194,7 @@ In practice, you can use different types of floating-point precision depending o
194
194
195
195
### Convert the image data to the floating-point format
196
196
197
-
The images data contain 8-bit integers encoded in the [0, 255] interval with color values between 0 and 255.
197
+
The images data contain 8-bit integers encoded in the [0, 255] interval with color values between 0 and 255.
198
198
199
199
You will normalize them into floating-point arrays in the [0, 1] interval by dividing them by 255.
200
200
@@ -227,7 +227,7 @@ print('The data type of test images: {}'.format(test_images.dtype))
227
227
```
228
228
229
229
> **Note:** You can also check that normalization was successful by printing `training_images[0]` in a notebook cell. Your long output should contain an array of floating-point numbers:
230
-
>
230
+
>
231
231
> ```
232
232
> ...
233
233
> 0. , 0. , 0.01176471, 0.07058824, 0.07058824,
@@ -240,7 +240,7 @@ print('The data type of test images: {}'.format(test_images.dtype))
240
240
241
241
You will use one-hot encoding to embed each digit label as an all-zero vector with `np.zeros()` and place `1` for a label index. As a result, your label data will be arrays with `1.0` (or `1.`) in the position of each image label.
242
242
243
-
Since there are 10 labels (from 0 to 9) in total, your arrays will look similar to this:
243
+
Since there are 10 labels (from 0 to 9) in total, your arrays will look similar to this:
244
244
245
245
```
246
246
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.])
@@ -257,7 +257,7 @@ print('The data type of test labels: {}'.format(y_test.dtype))
257
257
258
258
```{code-cell} ipython3
259
259
def one_hot_encoding(labels, dimension=10):
260
-
# Define a one-hot variable for an all-zero vector
260
+
# Define a one-hot variable for an all-zero vector
@@ -307,20 +307,20 @@ Afterwards, you will construct the building blocks of a simple deep learning mod
307
307
-_Layers_: These building blocks work as data filters — they process data and learn representations from inputs to better predict the target outputs.
308
308
309
309
You will use 1 hidden layer in your model to pass the inputs forward (_forward propagation_) and propagate the gradients/error derivatives of a loss function backward (_backpropagation_). These are input, hidden and output layers.
310
-
310
+
311
311
In the hidden (middle) and output (last) layers, the neural network model will compute the weighted sum of inputs. To compute this process, you will use NumPy's matrix multiplication function (the "dot multiply" or `np.dot(layer, weights)`).
312
312
313
313
> **Note:** For simplicity, the bias term is omitted in this example (there is no `np.dot(layer, weights) + bias`).
314
314
315
315
-_Weights_: These are important adjustable parameters that the neural network fine-tunes by forward and backward propagating the data. They are optimized through a process called [gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). Before the model training starts, the weights are randomly initialized with NumPy's [`Generator.random()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html).
316
-
317
-
The optimal weights should produce the highest prediction accuracy and the lowest error on the training and test sets.
316
+
317
+
The optimal weights should produce the highest prediction accuracy and the lowest error on the training and test sets.
318
318
319
319
-_Activation function_: Deep learning models are capable of determining non-linear relationships between inputs and outputs and these [non-linear functions](https://en.wikipedia.org/wiki/Activation_function) are usually applied to the output of each layer.
320
320
321
321
You will use a [rectified linear unit (ReLU)](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) to the hidden layer's output (for example, `relu(np.dot(layer, weights))`.
322
322
323
-
-_Regularization_: This [technique](https://en.wikipedia.org/wiki/Regularization_(mathematics)) helps prevent the neural network model from [overfitting](https://en.wikipedia.org/wiki/Overfitting).
323
+
-_Regularization_: This [technique](https://en.wikipedia.org/wiki/Regularization_(mathematics)) helps prevent the neural network model from [overfitting](https://en.wikipedia.org/wiki/Overfitting).
324
324
325
325
In this example, you will use a method called dropout — [dilution](https://en.wikipedia.org/wiki/Dilution_(neural_networks)) — that randomly sets a number of features in a layer to 0s. You will define it with NumPy's [`Generator.integers()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html) method and apply it to the hidden layer of the network.
326
326
@@ -336,34 +336,34 @@ Here is a summary of the neural network model architecture and the training proc
336
336
337
337

338
338
339
-
-_The input layer_:
339
+
-_The input layer_:
340
340
341
341
It is the input for the network — the previously preprocessed data that is loaded from `training_images` into `layer_0`.
342
342
343
-
-_The hidden (middle) layer_:
343
+
-_The hidden (middle) layer_:
344
344
345
345
`layer_1` takes the output from the previous layer and performs matrix-multiplication of the input by weights (`weights_1`) with NumPy's `np.dot()`).
346
346
347
347
Then, this output is passed through the ReLU activation function for non-linearity and then dropout is applied to help with overfitting.
348
348
349
-
-_The output (last) layer_:
349
+
-_The output (last) layer_:
350
350
351
351
`layer_2` ingests the output from `layer_1` and repeats the same "dot multiply" process with `weights_2`.
352
352
353
353
The final output returns 10 scores for each of the 0-9 digit labels. The network model ends with a size 10 layer — a 10-dimensional vector.
354
354
355
-
-_Forward propagation, backpropagation, training loop_:
355
+
-_Forward propagation, backpropagation, training loop_:
356
+
357
+
In the beginning of model training, your network randomly initializes the weights and feeds the input data forward through the hidden and output layers. This process is the forward pass or forward propagation.
356
358
357
-
In the beginning of model training, your network randomly initializes the weights and feeds the input data forward through the hidden and output layers. This process is the forward pass or forward propagation.
358
-
359
-
Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights values with the help of the learning rate parameter (more on that later).
360
-
361
-
> **Note:** In more technical terms, you:
362
-
>
359
+
Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights values with the help of the learning rate parameter (more on that later).
360
+
361
+
> **Note:** In more technical terms, you:
362
+
>
363
363
> 1. Measure the error by comparing the real label of an image (the truth) with the prediction of the model.
364
364
> 2. Differentiate the loss function.
365
-
> 3. Ingest the [gradients](https://en.wikipedia.org/wiki/Gradient) with the respect to the output, and backpropagate them with the respect to the inputs through the layer(s).
366
-
>
365
+
> 3. Ingest the [gradients](https://en.wikipedia.org/wiki/Gradient) with the respect to the output, and backpropagate them with the respect to the inputs through the layer(s).
366
+
>
367
367
> Since the network contains tensor operations and weight matrices, backpropagation uses the [chain rule](https://en.wikipedia.org/wiki/Chain_rule).
368
368
>
369
369
> With each iteration (epoch) of the neural network training, this forward and backward propagation cycle adjusts the weights, which is reflected in the accuracy and error metrics. As you train the model, your goal is to minimize the error and maximize the accuracy on the training data, where the model learns from, as well as the test data, where you evaluate the model.
# Set up a derivative of the ReLU function that returns 1 for a positive input
390
+
# Set up a derivative of the ReLU function that returns 1 for a positive input
391
391
# and 0 otherwise.
392
392
def relu2deriv(output):
393
393
return output >= 0
@@ -450,8 +450,8 @@ for j in range(epochs):
450
450
# Initialize the training image data as inputs.
451
451
layer_0 = training_images[i]
452
452
# 2. The hidden layer:
453
-
# Take in the training image data into the middle layer by
454
-
# matrix-multiplying it by randomly initialized weights.
453
+
# Take in the training image data into the middle layer by
454
+
# matrix-multiplying it by randomly initialized weights.
455
455
layer_1 = np.dot(layer_0, weights_1)
456
456
# 3. Pass the hidden layer's output through the ReLU activation function.
457
457
layer_1 = relu(layer_1)
@@ -552,7 +552,7 @@ axes[1].set_xlabel("Epochs")
552
552
plt.show()
553
553
```
554
554
555
-
The accuracy rates that your model reaches during training and testing may be somewhat plausible but you may also find the error rates to be quite high.
555
+
The accuracy rates that your model reaches during training and testing may be somewhat plausible but you may also find the error rates to be quite high.
556
556
557
557
To reduce the error during training and testing, you can consider changing the simple loss function to, for example, categorical [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy). Other possible solutions are discussed below.
558
558
@@ -571,6 +571,12 @@ To further enhance and optimize your neural network model, you can consider one
571
571
- Apply [batch normalization](https://en.wikipedia.org/wiki/Batch_normalization) for faster and more stable training.
572
572
- Tune other parameters, such as the learning rate and hidden layer size.
573
573
574
-
Finally, you can go beyond NumPy with specialized frameworks and APIs — such as [TensorFlow](https://www.tensorflow.org/guide/tf_numpy?hl=el), [PyTorch](https://pytorch.org/docs/stable/generated/torch.from_numpy.html), Swift for TensorFlow (with [Python interoperability](https://www.tensorflow.org/swift/tutorials/python_interoperability)), and [JAX](https://github.com/google/jax) — that support NumPy, have built-in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation), and are designed for high-performance numerical computing and machine learning.
574
+
Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as [PyTorch](https://pytorch.org/), [JAX](https://github.com/google/jax), [TensorFlow](https://www.tensorflow.org/guide/tf_numpy) or [MXNet](https://mxnet.apache.org) — that provide NumPy-like APIs, have built-in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) and GPU support, and are designed for high-performance numerical computing and machine learning.
575
+
576
+
Finally, when developing a machine learning model, you should think about potential ethical issues and apply practices to avoid or mitigate those:
577
+
- Document a trained model with a Model Card - see the [Model Cards for Model Reporting paper](https://doi.org/10.1145/3287560.3287596) by Margaret Mitchell et al..
578
+
- Document a dataset with a Datasheet - see the [Datasheets for Datasets paper](https://arxiv.org/abs/1803.09010)) by Timnit Gebru et al..
579
+
- Consider the impact of your model - who is affected by it, who does it benefit - see [the article](https://www.nature.com/articles/d41586-020-02003-2) and [talk](https://slideslive.com/38923453/the-values-of-machine-learning) by Pratyusha Kalluri.
580
+
- For more resources, see [this blog post by Rachel Thomas](https://www.fast.ai/2018/09/24/ai-ethics-resources/) and the [Radical AI podcast](https://www.radicalai.org/).
575
581
576
582
(Credit to [hsjeong5](https://github.com/hsjeong5/MNIST-for-Numpy) for demonstrating how to download MNIST without the use of external libraries.)
0 commit comments