Skip to content

Commit 63deb86

Browse files
authored
Update MNIST tutorial - deep learning frameworks & ethics resources (#84)
For the deep learning frameworks: - make it more clear that these should be used for any serious deep learning, NumPy isn't the right library beyond learning. - Remove Swift for TensorFlow (Google abandoned it), add MXNet, and order frameworks in the order we'd want to recommend them. For the ethics section: this is of pretty fundamental importance, and at least some pointers should be provided in introductory ML/AI resources so people new to the field at least are aware it's a thing and can explore the topic.
1 parent d4ab0c2 commit 63deb86

File tree

3 files changed

+66
-59
lines changed

3 files changed

+66
-59
lines changed

content/tutorial-deep-learning-on-mnist.md

+36-30
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ kernelspec:
1515

1616
This tutorial demonstrates how to build a simple [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) (with one hidden layer) and train it from scratch with NumPy to recognize handwritten digit images.
1717

18-
Your deep learning model — one of the most basic artificial neural networks that resembles the original [multi-layer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) — will learn to classify digits from 0 to 9 from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. The dataset contains 60,000 training and 10,000 test images and corresponding labels. Each training and test image is of size 784 (or 28x28 pixels) — this will be your input for the neural network.
18+
Your deep learning model — one of the most basic artificial neural networks that resembles the original [multi-layer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) — will learn to classify digits from 0 to 9 from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. The dataset contains 60,000 training and 10,000 test images and corresponding labels. Each training and test image is of size 784 (or 28x28 pixels) — this will be your input for the neural network.
1919

2020
Based on the image inputs and their labels ([supervised learning](https://en.wikipedia.org/wiki/Supervised_learning)), your neural network will be trained to learn their features using forward propagation and backpropagation ([reverse-mode](https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation) differentiation). The final output of the network is a vector of 10 scores — one for each handwritten digit image. You will also evaluate how good your model is at classifying the images on the test set.
2121

@@ -25,13 +25,13 @@ This tutorial was adapted from the work by [Andrew Trask](https://github.com/iam
2525

2626
## Prerequisites
2727

28-
The reader should have some knowledge of Python, NumPy array manipulation, and linear algebra. In addition, you should be familiar with main concepts of [deep learning](https://en.wikipedia.org/wiki/Deep_learning).
28+
The reader should have some knowledge of Python, NumPy array manipulation, and linear algebra. In addition, you should be familiar with main concepts of [deep learning](https://en.wikipedia.org/wiki/Deep_learning).
2929

30-
To refresh the memory, you can take the [Python](https://docs.python.org/dev/tutorial/index.html) and [Linear algebra on n-dimensional arrays](https://numpy.org/doc/stable/user/tutorial-svd.html) tutorials.
30+
To refresh the memory, you can take the [Python](https://docs.python.org/dev/tutorial/index.html) and [Linear algebra on n-dimensional arrays](https://numpy.org/doc/stable/user/tutorial-svd.html) tutorials.
3131

3232
You are advised to read the [Deep learning](http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf) paper published in 2015 by Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, who are regarded as some of the pioneers of the field. You should also consider reading Andrew Trask's [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning), which teaches deep learning with NumPy.
3333

34-
In addition to NumPy, you will be utilizing the following Python standard modules for data loading and processing:
34+
In addition to NumPy, you will be utilizing the following Python standard modules for data loading and processing:
3535
- [`urllib`](https://docs.python.org/3/library/urllib.html) for URL handling
3636
- [`request`](https://docs.python.org/3/library/urllib.request.html) for URL opening
3737
- [`gzip`](https://docs.python.org/3/library/gzip.html) for gzip file decompression
@@ -167,7 +167,7 @@ for sample, ax in zip(rng.choice(x_train, size=num_examples, replace=False), axe
167167

168168
> **Note:** You can also visualize a sample image as an array by printing `x_train[59999]`. Here, `59999` is your 60,000th training image sample (`0` would be your first). Your output will be quite long and should contain an array of 8-bit integers:
169169
>
170-
>
170+
>
171171
> ```
172172
> ...
173173
> 0, 0, 38, 48, 48, 22, 0, 0, 0, 0, 0, 0, 0,
@@ -194,7 +194,7 @@ In practice, you can use different types of floating-point precision depending o
194194

195195
### Convert the image data to the floating-point format
196196

197-
The images data contain 8-bit integers encoded in the [0, 255] interval with color values between 0 and 255.
197+
The images data contain 8-bit integers encoded in the [0, 255] interval with color values between 0 and 255.
198198

199199
You will normalize them into floating-point arrays in the [0, 1] interval by dividing them by 255.
200200

@@ -227,7 +227,7 @@ print('The data type of test images: {}'.format(test_images.dtype))
227227
```
228228

229229
> **Note:** You can also check that normalization was successful by printing `training_images[0]` in a notebook cell. Your long output should contain an array of floating-point numbers:
230-
>
230+
>
231231
> ```
232232
> ...
233233
> 0. , 0. , 0.01176471, 0.07058824, 0.07058824,
@@ -240,7 +240,7 @@ print('The data type of test images: {}'.format(test_images.dtype))
240240
241241
You will use one-hot encoding to embed each digit label as an all-zero vector with `np.zeros()` and place `1` for a label index. As a result, your label data will be arrays with `1.0` (or `1.`) in the position of each image label.
242242
243-
Since there are 10 labels (from 0 to 9) in total, your arrays will look similar to this:
243+
Since there are 10 labels (from 0 to 9) in total, your arrays will look similar to this:
244244
245245
```
246246
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.])
@@ -257,7 +257,7 @@ print('The data type of test labels: {}'.format(y_test.dtype))
257257

258258
```{code-cell} ipython3
259259
def one_hot_encoding(labels, dimension=10):
260-
# Define a one-hot variable for an all-zero vector
260+
# Define a one-hot variable for an all-zero vector
261261
# with 10 dimensions (number labels from 0 to 9).
262262
one_hot_labels = (labels[..., None] == np.arange(dimension)[None])
263263
# Return one-hot encoded labels.
@@ -307,20 +307,20 @@ Afterwards, you will construct the building blocks of a simple deep learning mod
307307
- _Layers_: These building blocks work as data filters — they process data and learn representations from inputs to better predict the target outputs.
308308

309309
You will use 1 hidden layer in your model to pass the inputs forward (_forward propagation_) and propagate the gradients/error derivatives of a loss function backward (_backpropagation_). These are input, hidden and output layers.
310-
310+
311311
In the hidden (middle) and output (last) layers, the neural network model will compute the weighted sum of inputs. To compute this process, you will use NumPy's matrix multiplication function (the "dot multiply" or `np.dot(layer, weights)`).
312312

313313
> **Note:** For simplicity, the bias term is omitted in this example (there is no `np.dot(layer, weights) + bias`).
314314
315315
- _Weights_: These are important adjustable parameters that the neural network fine-tunes by forward and backward propagating the data. They are optimized through a process called [gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). Before the model training starts, the weights are randomly initialized with NumPy's [`Generator.random()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html).
316-
317-
The optimal weights should produce the highest prediction accuracy and the lowest error on the training and test sets.
316+
317+
The optimal weights should produce the highest prediction accuracy and the lowest error on the training and test sets.
318318

319319
- _Activation function_: Deep learning models are capable of determining non-linear relationships between inputs and outputs and these [non-linear functions](https://en.wikipedia.org/wiki/Activation_function) are usually applied to the output of each layer.
320320

321321
You will use a [rectified linear unit (ReLU)](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) to the hidden layer's output (for example, `relu(np.dot(layer, weights))`.
322322

323-
- _Regularization_: This [technique](https://en.wikipedia.org/wiki/Regularization_(mathematics)) helps prevent the neural network model from [overfitting](https://en.wikipedia.org/wiki/Overfitting).
323+
- _Regularization_: This [technique](https://en.wikipedia.org/wiki/Regularization_(mathematics)) helps prevent the neural network model from [overfitting](https://en.wikipedia.org/wiki/Overfitting).
324324

325325
In this example, you will use a method called dropout — [dilution](https://en.wikipedia.org/wiki/Dilution_(neural_networks)) — that randomly sets a number of features in a layer to 0s. You will define it with NumPy's [`Generator.integers()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html) method and apply it to the hidden layer of the network.
326326

@@ -336,34 +336,34 @@ Here is a summary of the neural network model architecture and the training proc
336336

337337
![Diagram showing operations detailed in this tutorial](_static/tutorial-deep-learning-on-mnist.png)
338338

339-
- _The input layer_:
339+
- _The input layer_:
340340

341341
It is the input for the network — the previously preprocessed data that is loaded from `training_images` into `layer_0`.
342342

343-
- _The hidden (middle) layer_:
343+
- _The hidden (middle) layer_:
344344

345345
`layer_1` takes the output from the previous layer and performs matrix-multiplication of the input by weights (`weights_1`) with NumPy's `np.dot()`).
346346

347347
Then, this output is passed through the ReLU activation function for non-linearity and then dropout is applied to help with overfitting.
348348

349-
- _The output (last) layer_:
349+
- _The output (last) layer_:
350350

351351
`layer_2` ingests the output from `layer_1` and repeats the same "dot multiply" process with `weights_2`.
352352

353353
The final output returns 10 scores for each of the 0-9 digit labels. The network model ends with a size 10 layer — a 10-dimensional vector.
354354

355-
- _Forward propagation, backpropagation, training loop_:
355+
- _Forward propagation, backpropagation, training loop_:
356+
357+
In the beginning of model training, your network randomly initializes the weights and feeds the input data forward through the hidden and output layers. This process is the forward pass or forward propagation.
356358

357-
In the beginning of model training, your network randomly initializes the weights and feeds the input data forward through the hidden and output layers. This process is the forward pass or forward propagation.
358-
359-
Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights values with the help of the learning rate parameter (more on that later).
360-
361-
> **Note:** In more technical terms, you:
362-
>
359+
Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights values with the help of the learning rate parameter (more on that later).
360+
361+
> **Note:** In more technical terms, you:
362+
>
363363
> 1. Measure the error by comparing the real label of an image (the truth) with the prediction of the model.
364364
> 2. Differentiate the loss function.
365-
> 3. Ingest the [gradients](https://en.wikipedia.org/wiki/Gradient) with the respect to the output, and backpropagate them with the respect to the inputs through the layer(s).
366-
>
365+
> 3. Ingest the [gradients](https://en.wikipedia.org/wiki/Gradient) with the respect to the output, and backpropagate them with the respect to the inputs through the layer(s).
366+
>
367367
> Since the network contains tensor operations and weight matrices, backpropagation uses the [chain rule](https://en.wikipedia.org/wiki/Chain_rule).
368368
>
369369
> With each iteration (epoch) of the neural network training, this forward and backward propagation cycle adjusts the weights, which is reflected in the accuracy and error metrics. As you train the model, your goal is to minimize the error and maximize the accuracy on the training data, where the model learns from, as well as the test data, where you evaluate the model.
@@ -387,7 +387,7 @@ rng = np.random.default_rng(seed)
387387
def relu (x):
388388
return (x>=0) * x
389389
390-
# Set up a derivative of the ReLU function that returns 1 for a positive input
390+
# Set up a derivative of the ReLU function that returns 1 for a positive input
391391
# and 0 otherwise.
392392
def relu2deriv(output):
393393
return output >= 0
@@ -450,8 +450,8 @@ for j in range(epochs):
450450
# Initialize the training image data as inputs.
451451
layer_0 = training_images[i]
452452
# 2. The hidden layer:
453-
# Take in the training image data into the middle layer by
454-
# matrix-multiplying it by randomly initialized weights.
453+
# Take in the training image data into the middle layer by
454+
# matrix-multiplying it by randomly initialized weights.
455455
layer_1 = np.dot(layer_0, weights_1)
456456
# 3. Pass the hidden layer's output through the ReLU activation function.
457457
layer_1 = relu(layer_1)
@@ -552,7 +552,7 @@ axes[1].set_xlabel("Epochs")
552552
plt.show()
553553
```
554554

555-
The accuracy rates that your model reaches during training and testing may be somewhat plausible but you may also find the error rates to be quite high.
555+
The accuracy rates that your model reaches during training and testing may be somewhat plausible but you may also find the error rates to be quite high.
556556

557557
To reduce the error during training and testing, you can consider changing the simple loss function to, for example, categorical [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy). Other possible solutions are discussed below.
558558

@@ -571,6 +571,12 @@ To further enhance and optimize your neural network model, you can consider one
571571
- Apply [batch normalization](https://en.wikipedia.org/wiki/Batch_normalization) for faster and more stable training.
572572
- Tune other parameters, such as the learning rate and hidden layer size.
573573

574-
Finally, you can go beyond NumPy with specialized frameworks and APIs — such as [TensorFlow](https://www.tensorflow.org/guide/tf_numpy?hl=el), [PyTorch](https://pytorch.org/docs/stable/generated/torch.from_numpy.html), Swift for TensorFlow (with [Python interoperability](https://www.tensorflow.org/swift/tutorials/python_interoperability)), and [JAX](https://github.com/google/jax) — that support NumPy, have built-in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation), and are designed for high-performance numerical computing and machine learning.
574+
Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as [PyTorch](https://pytorch.org/), [JAX](https://github.com/google/jax), [TensorFlow](https://www.tensorflow.org/guide/tf_numpy) or [MXNet](https://mxnet.apache.org) — that provide NumPy-like APIs, have built-in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) and GPU support, and are designed for high-performance numerical computing and machine learning.
575+
576+
Finally, when developing a machine learning model, you should think about potential ethical issues and apply practices to avoid or mitigate those:
577+
- Document a trained model with a Model Card - see the [Model Cards for Model Reporting paper](https://doi.org/10.1145/3287560.3287596) by Margaret Mitchell et al..
578+
- Document a dataset with a Datasheet - see the [Datasheets for Datasets paper](https://arxiv.org/abs/1803.09010)) by Timnit Gebru et al..
579+
- Consider the impact of your model - who is affected by it, who does it benefit - see [the article](https://www.nature.com/articles/d41586-020-02003-2) and [talk](https://slideslive.com/38923453/the-values-of-machine-learning) by Pratyusha Kalluri.
580+
- For more resources, see [this blog post by Rachel Thomas](https://www.fast.ai/2018/09/24/ai-ethics-resources/) and the [Radical AI podcast](https://www.radicalai.org/).
575581

576582
(Credit to [hsjeong5](https://github.com/hsjeong5/MNIST-for-Numpy) for demonstrating how to download MNIST without the use of external libraries.)

0 commit comments

Comments
 (0)