Skip to content

Commit 65fc57b

Browse files
committed
Reverted previous commit.
Lowered number of epochs in training step to avoid timeout.
1 parent fdb7978 commit 65fc57b

File tree

1 file changed

+115
-127
lines changed

1 file changed

+115
-127
lines changed

content/tutorial-deep-learning-on-mnist.md

+115-127
Original file line numberDiff line numberDiff line change
@@ -360,14 +360,14 @@ def relu2deriv(output):
360360
**3.** Set certain default values of [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)), such as:
361361

362362
- [_Learning rate_](https://en.wikipedia.org/wiki/Learning_rate): `learning_rate` — helps limit the magnitude of weight updates to prevent them from overcorrecting.
363-
- _Epochs (iterations)_: `epochs` — the number of complete passes — forward and backward propagations — of the data through the network. This parameter can positively or negatively affect the results. The higher the iterations, the longer the learning process may take.
363+
- _Epochs (iterations)_: `epochs` — the number of complete passes — forward and backward propagations — of the data through the network. This parameter can positively or negatively affect the results. The higher the iterations, the longer the learning process may take. Because this is a computationally intensive task, we have chosen a very low number of epochs (20). To get meaningful results, you should choose a much larger number.
364364
- _Size of the hidden (middle) layer in a network_: `hidden_size` — different sizes of the hidden layer can affect the results during training and testing.
365365
- _Size of the input:_ `pixels_per_image` — you have established that the image input is 784 (28x28) (in pixels).
366366
- _Number of labels_: `num_labels` — indicates the output number for the output layer where the predictions occur for 10 (0 to 9) handwritten digit labels.
367367

368368
```{code-cell} ipython3
369369
learning_rate = 0.005
370-
epochs = 100
370+
epochs = 20
371371
hidden_size = 100
372372
pixels_per_image = 784
373373
num_labels = 10
@@ -382,108 +382,99 @@ weights_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
382382

383383
**5.** Set up the neural network's learning experiment with a training loop and start the training process.
384384

385-
**Note** Because the training is an intensive computational process, its
386-
execution is disabled by default on this notebook. To enable execution and
387-
test the code yourself, set `execute_training` in the cell below to `True`.
388-
389-
```{code-cell} ipython3
390-
execute_training = False
391-
```
392-
393385
Start the training process:
394386

395387
```{code-cell} ipython3
396-
if execute_training:
397-
# To store training and test set losses and accurate predictions
398-
# for visualization.
399-
store_training_loss = []
400-
store_training_accurate_pred = []
401-
store_test_loss = []
402-
store_test_accurate_pred = []
403-
404-
# This is a training loop.
405-
# Run the learning experiment for a defined number of epochs (iterations).
406-
for j in range(epochs):
407-
# Set the initial loss/error and the number of accurate predictions to zero.
408-
training_loss = 0.0
409-
training_accurate_predictions = 0
410-
411-
# For all images in the training set, perform a forward pass
412-
# and backpropagation and adjust the weights accordingly.
413-
for i in range(len(training_images)):
414-
# Forward propagation/forward pass:
415-
# 1. The input layer:
416-
# Initialize the training image data as inputs.
417-
layer_0 = training_images[i]
418-
# 2. The hidden layer:
419-
# Take in the training image data into the middle layer by
420-
# matrix-multiplying it by randomly initialized weights.
421-
layer_1 = np.dot(layer_0, weights_1)
422-
# 3. Pass the hidden layer's output through the ReLU activation function.
423-
layer_1 = relu(layer_1)
424-
# 4. Define the dropout function for regularization.
425-
dropout_mask = np.random.randint(0, high=2, size=layer_1.shape)
426-
# 5. Apply dropout to the hidden layer's output.
427-
layer_1 *= dropout_mask * 2
428-
# 6. The output layer:
429-
# Ingest the output of the middle layer into the the final layer
430-
# by matrix-multiplying it by randomly initialized weights.
431-
# Produce a 10-dimension vector with 10 scores.
432-
layer_2 = np.dot(layer_1, weights_2)
433-
434-
# Backpropagation/backward pass:
435-
# 1. Measure the training error (loss function) between the actual
436-
# image labels (the truth) and the prediction by the model.
437-
training_loss += np.sum((training_labels[i] - layer_2) ** 2)
438-
# 2. Increment the accurate prediction count.
439-
training_accurate_predictions += int(np.argmax(layer_2) == np.argmax(training_labels[i]))
440-
# 3. Differentiate the loss function/error.
441-
layer_2_delta = (training_labels[i] - layer_2)
442-
# 4. Propagate the gradients of the loss function back through the hidden layer.
443-
layer_1_delta = np.dot(weights_2, layer_2_delta) * relu2deriv(layer_1)
444-
# 5. Apply the dropout to the gradients.
445-
layer_1_delta *= dropout_mask
446-
# 6. Update the weights for the middle and input layers
447-
# by multiplying them by the learning rate and the gradients.
448-
weights_1 += learning_rate * np.outer(layer_0, layer_1_delta)
449-
weights_2 += learning_rate * np.outer(layer_1, layer_2_delta)
450-
451-
# Store training set losses and accurate predictions.
452-
store_training_loss.append(training_loss)
453-
store_training_accurate_pred.append(training_accurate_predictions)
454-
455-
# Evaluate on the test set:
456-
# 1. Set the initial error and the number of accurate predictions to zero.
457-
test_loss = 0.0
458-
test_accurate_predictions = 0
459-
460-
# 2. Start testing the model by evaluating on the test image dataset.
461-
for i in range(len(test_images)):
462-
# 1. Pass the test images through the input layer.
463-
layer_0 = test_images[i]
464-
# 2. Compute the weighted sum of the test image inputs in and
465-
# pass the hidden layer's output through ReLU.
466-
layer_1 = relu(np.dot(layer_0, weights_1))
467-
# 3. Compute the weighted sum of the hidden layer's inputs.
468-
# Produce a 10-dimensional vector with 10 scores.
469-
layer_2 = np.dot(layer_1, weights_2)
470-
471-
# 4. Measure the error between the actual label (truth) and prediction values.
472-
test_loss += np.sum((test_labels[i] - layer_2) ** 2)
473-
# 5. Increment the accurate prediction count.
474-
test_accurate_predictions += int(np.argmax(layer_2) == np.argmax(test_labels[i]))
475-
476-
# Store test set losses and accurate predictions.
477-
store_test_loss.append(test_loss)
478-
store_test_accurate_pred.append(test_accurate_predictions)
479-
480-
# 3. Display the error and accuracy metrics in the output.
481-
print("\n" + \
482-
"Epoch: " + str(j) + \
483-
" Training set error:" + str(training_loss/ float(len(training_images)))[0:5] +\
484-
" Training set accuracy:" + str(training_accurate_predictions/ float(len(training_images))) +\
485-
" Test set error:" + str(test_loss/ float(len(test_images)))[0:5] +\
486-
" Test set accuracy:" + str(test_accurate_predictions/ float(len(test_images))))
388+
# To store training and test set losses and accurate predictions
389+
# for visualization.
390+
store_training_loss = []
391+
store_training_accurate_pred = []
392+
store_test_loss = []
393+
store_test_accurate_pred = []
394+
395+
# This is a training loop.
396+
# Run the learning experiment for a defined number of epochs (iterations).
397+
for j in range(epochs):
398+
# Set the initial loss/error and the number of accurate predictions to zero.
399+
training_loss = 0.0
400+
training_accurate_predictions = 0
401+
402+
# For all images in the training set, perform a forward pass
403+
# and backpropagation and adjust the weights accordingly.
404+
for i in range(len(training_images)):
405+
# Forward propagation/forward pass:
406+
# 1. The input layer:
407+
# Initialize the training image data as inputs.
408+
layer_0 = training_images[i]
409+
# 2. The hidden layer:
410+
# Take in the training image data into the middle layer by
411+
# matrix-multiplying it by randomly initialized weights.
412+
layer_1 = np.dot(layer_0, weights_1)
413+
# 3. Pass the hidden layer's output through the ReLU activation function.
414+
layer_1 = relu(layer_1)
415+
# 4. Define the dropout function for regularization.
416+
dropout_mask = np.random.randint(0, high=2, size=layer_1.shape)
417+
# 5. Apply dropout to the hidden layer's output.
418+
layer_1 *= dropout_mask * 2
419+
# 6. The output layer:
420+
# Ingest the output of the middle layer into the the final layer
421+
# by matrix-multiplying it by randomly initialized weights.
422+
# Produce a 10-dimension vector with 10 scores.
423+
layer_2 = np.dot(layer_1, weights_2)
424+
425+
# Backpropagation/backward pass:
426+
# 1. Measure the training error (loss function) between the actual
427+
# image labels (the truth) and the prediction by the model.
428+
training_loss += np.sum((training_labels[i] - layer_2) ** 2)
429+
# 2. Increment the accurate prediction count.
430+
training_accurate_predictions += int(np.argmax(layer_2) == np.argmax(training_labels[i]))
431+
# 3. Differentiate the loss function/error.
432+
layer_2_delta = (training_labels[i] - layer_2)
433+
# 4. Propagate the gradients of the loss function back through the hidden layer.
434+
layer_1_delta = np.dot(weights_2, layer_2_delta) * relu2deriv(layer_1)
435+
# 5. Apply the dropout to the gradients.
436+
layer_1_delta *= dropout_mask
437+
# 6. Update the weights for the middle and input layers
438+
# by multiplying them by the learning rate and the gradients.
439+
weights_1 += learning_rate * np.outer(layer_0, layer_1_delta)
440+
weights_2 += learning_rate * np.outer(layer_1, layer_2_delta)
441+
442+
# Store training set losses and accurate predictions.
443+
store_training_loss.append(training_loss)
444+
store_training_accurate_pred.append(training_accurate_predictions)
445+
446+
# Evaluate on the test set:
447+
# 1. Set the initial error and the number of accurate predictions to zero.
448+
test_loss = 0.0
449+
test_accurate_predictions = 0
450+
451+
# 2. Start testing the model by evaluating on the test image dataset.
452+
for i in range(len(test_images)):
453+
# 1. Pass the test images through the input layer.
454+
layer_0 = test_images[i]
455+
# 2. Compute the weighted sum of the test image inputs in and
456+
# pass the hidden layer's output through ReLU.
457+
layer_1 = relu(np.dot(layer_0, weights_1))
458+
# 3. Compute the weighted sum of the hidden layer's inputs.
459+
# Produce a 10-dimensional vector with 10 scores.
460+
layer_2 = np.dot(layer_1, weights_2)
461+
462+
# 4. Measure the error between the actual label (truth) and prediction values.
463+
test_loss += np.sum((test_labels[i] - layer_2) ** 2)
464+
# 5. Increment the accurate prediction count.
465+
test_accurate_predictions += int(np.argmax(layer_2) == np.argmax(test_labels[i]))
466+
467+
# Store test set losses and accurate predictions.
468+
store_test_loss.append(test_loss)
469+
store_test_accurate_pred.append(test_accurate_predictions)
470+
471+
# 3. Display the error and accuracy metrics in the output.
472+
print("\n" + \
473+
"Epoch: " + str(j) + \
474+
" Training set error:" + str(training_loss/ float(len(training_images)))[0:5] +\
475+
" Training set accuracy:" + str(training_accurate_predictions/ float(len(training_images))) +\
476+
" Test set error:" + str(test_loss/ float(len(test_images)))[0:5] +\
477+
" Test set accuracy:" + str(test_accurate_predictions/ float(len(test_images))))
487478
```
488479

489480
The training process may take many minutes, depending on a number of factors, such as the processing power of the machine you are running the experiment on and the number of epochs. To reduce the waiting time, you can change the epoch (iteration) variable from 100 to a lower number, reset the runtime (which will reset the weights), and run the notebook cells again.
@@ -493,32 +484,29 @@ The training process may take many minutes, depending on a number of factors, su
493484
After executing the cell above, you can visualize the training and test set errors and accuracy for an instance of this training process.
494485

495486
```{code-cell} ipython3
496-
:tags: [raises-exception, hide-output]
497-
498-
if execute_training:
499-
# The training set metrics.
500-
y_training_error = [store_training_loss[i]/float(len(training_images)) for i in range(len(store_training_loss))]
501-
x_training_error = range(1, len(store_training_loss)+1)
502-
y_training_accuracy = [store_training_accurate_pred[i]/ float(len(training_images)) for i in range(len(store_training_accurate_pred))]
503-
x_training_accuracy = range(1, len(store_training_accurate_pred)+1)
504-
505-
# The test set metrics.
506-
y_test_error = [store_test_loss[i]/float(len(test_images)) for i in range(len(store_test_loss))]
507-
x_test_error = range(1, len(store_test_loss)+1)
508-
y_test_accuracy = [store_training_accurate_pred[i]/ float(len(training_images)) for i in range(len(store_training_accurate_pred))]
509-
x_test_accuracy = range(1, len(store_test_accurate_pred)+1)
510-
511-
# Display the plots.
512-
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
513-
axes[0].set_title('Training set error, accuracy')
514-
axes[0].plot(x_training_accuracy, y_training_accuracy, label = "Training set accuracy")
515-
axes[0].plot(x_training_error, y_training_error, label = "Training set error")
516-
axes[0].set_xlabel("Epochs")
517-
axes[1].set_title('Test set error, accuracy')
518-
axes[1].plot(x_test_accuracy, y_test_accuracy, label = "Test set accuracy")
519-
axes[1].plot(x_test_error, y_test_error, label = "Test set error")
520-
axes[1].set_xlabel("Epochs")
521-
plt.show()
487+
# The training set metrics.
488+
y_training_error = [store_training_loss[i]/float(len(training_images)) for i in range(len(store_training_loss))]
489+
x_training_error = range(1, len(store_training_loss)+1)
490+
y_training_accuracy = [store_training_accurate_pred[i]/ float(len(training_images)) for i in range(len(store_training_accurate_pred))]
491+
x_training_accuracy = range(1, len(store_training_accurate_pred)+1)
492+
493+
# The test set metrics.
494+
y_test_error = [store_test_loss[i]/float(len(test_images)) for i in range(len(store_test_loss))]
495+
x_test_error = range(1, len(store_test_loss)+1)
496+
y_test_accuracy = [store_training_accurate_pred[i]/ float(len(training_images)) for i in range(len(store_training_accurate_pred))]
497+
x_test_accuracy = range(1, len(store_test_accurate_pred)+1)
498+
499+
# Display the plots.
500+
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
501+
axes[0].set_title('Training set error, accuracy')
502+
axes[0].plot(x_training_accuracy, y_training_accuracy, label = "Training set accuracy")
503+
axes[0].plot(x_training_error, y_training_error, label = "Training set error")
504+
axes[0].set_xlabel("Epochs")
505+
axes[1].set_title('Test set error, accuracy')
506+
axes[1].plot(x_test_accuracy, y_test_accuracy, label = "Test set accuracy")
507+
axes[1].plot(x_test_error, y_test_error, label = "Test set error")
508+
axes[1].set_xlabel("Epochs")
509+
plt.show()
522510
```
523511

524512
The accuracy rates that your model reaches during training and testing may be somewhat plausible but you may also find the error rates to be quite high.

0 commit comments

Comments
 (0)