Supervised learning: deep learning
October 9, 2023
In this practical, we will create a feed-forward neural network as well as an optional convolutional neural network to analyze the famous MNIST dataset.
Let’s set the seed value and use the same number as below to reproduce the same results.
In this section, we will develop a deep feed-forward neural network for MNIST.
Data preparation
List of 2
$ train:List of 2
..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
$ test :List of 2
..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...
[1] 0 255
0 1 2 3 4 5 6 7 8 9
5923 6742 5958 6131 5842 5421 5918 6265 5851 5949
function below to plot the first training image. The img
parameter has to be a matrix with dimensions (28, 28)
Indexing in 3-dimensional arrays works the same as indexing in matrices, but you need an extra comma x[,,]
It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.
Multi-layer perceptron: multinomial logistic regression
The simplest neural network model is a multi-layer perceptron where we have no hidden layers and only input and output layers. We can call this a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1) for our mnist data. That model is shown below.
multinom <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) %>%
# flatten 28*28 matrix into single vector
layer_flatten() %>%
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
function. Describe why this model has 7850 parameters.
Model: "sequential"
Layer (type) Output Shape Param #
flatten (Flatten) (None, 784) 0
dense (Dense) (None, 10) 7850
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
Deep feed-forward neural networks
- sequential model
- flatten layer
- dense layer with 64 hidden units and “relu” activation function
- dense output layer with 10 units and softmax activation function
You may reuse code from the multinomial model.
ffnn <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) %>%
# flatten 28*28 matrix into single vector
layer_flatten() %>%
# this is the hidden layer!
layer_dense(64, activation = "relu") %>%
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
Model: "sequential_1"
Layer (type) Output Shape Param #
flatten_1 (Flatten) (None, 784) 0
dense_2 (Dense) (None, 64) 50240
dense_1 (Dense) (None, 10) 650
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
pred_multinom <- class_predict(multinom, x = mnist$test$x)
pred_ffnn <- class_predict(ffnn, x = mnist$test$x)
(ctab_multinom <- table(pred = pred_multinom, true = mnist$test$y))
pred 0 1 2 3 4 5 6 7 8 9
0 964 0 7 3 2 8 13 1 7 11
1 0 1113 11 0 1 3 3 6 8 7
2 1 3 918 18 5 3 8 20 6 1
3 2 3 21 929 2 37 1 12 26 11
4 0 0 9 0 909 6 7 7 8 24
5 5 1 4 19 0 776 13 1 21 7
6 5 3 12 2 10 14 910 0 7 0
7 2 2 8 9 4 8 1 941 10 16
8 1 10 38 21 10 31 2 2 871 6
9 0 0 4 9 39 6 0 38 10 926
pred 0 1 2 3 4 5 6 7 8 9
0 969 0 5 0 1 3 6 1 2 3
1 0 1122 2 0 0 0 4 4 0 3
2 1 3 997 4 4 0 1 5 2 1
3 2 3 4 975 0 5 1 2 7 4
4 1 0 4 0 952 0 5 0 6 9
5 2 0 1 15 0 876 11 0 3 3
6 2 2 2 0 3 1 926 0 1 1
7 1 1 5 4 4 1 1 1004 4 4
8 1 4 11 7 1 4 3 3 947 0
9 1 0 1 5 17 2 0 9 2 981
[1] 0.9257
[1] 0.9749
- sequential model
- flatten layer
- dense layer with 128 hidden units and “relu” activation function
- dense layer with 64 hidden units and “relu” activation function
- dense output layer with 10 units and softmax activation function
dffnn <-
keras_model_sequential(input_shape = c(28, 28)) %>% # initialize a sequential model
layer_flatten() %>% # flatten 28*28 matrix into single vector
layer_dense(128, activation = "relu") %>% # this is the hidden layer!
layer_dense(64, activation = "relu") %>% # this is the hidden layer!
layer_dense(10, activation = "softmax") # softmax outcome == logistic regression for each of 10 outputs
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
Model: "sequential_2"
Layer (type) Output Shape Param #
flatten_2 (Flatten) (None, 784) 0
dense_5 (Dense) (None, 128) 100480
dense_4 (Dense) (None, 64) 8256
dense_3 (Dense) (None, 10) 650
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
dffnn %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_dffnn <- class_predict(dffnn, x = mnist$test$x)
(ctab_dffnn <- table(pred = pred_dffnn, true = mnist$test$y))
pred 0 1 2 3 4 5 6 7 8 9
0 973 2 2 0 1 6 6 2 1 3
1 0 1128 3 1 1 0 2 4 0 7
2 1 0 998 3 5 0 2 8 5 1
3 0 1 12 985 0 8 1 6 16 9
4 1 0 1 2 962 0 5 0 5 16
5 1 0 0 11 0 868 5 0 5 5
6 1 2 4 0 3 6 936 0 3 1
7 1 1 3 5 2 0 0 1002 3 9
8 2 1 8 1 4 3 1 4 933 4
9 0 0 1 2 4 1 0 2 3 954
[1] 0.9739
OPTIONAL: convolutional neural network
Convolution layers in Keras need a specific form of data input.
For each example, they need a (width, height, channels)
array (tensor). For a colour image with 28*28 dimension, that shape is usually (28, 28, 3)
, where the channels indicate red, green, and blue. MNIST has no colour info, but we still need the channel dimension to enter the data into a convolution layer with shape (28, 28, 1)
. The training dataset x_train
should thus have shape (60000, 28, 28, 1)
cnn <-
keras_model_sequential(input_shape = c(28, 28, 1)) %>%
layer_conv_2d(filters = 6, kernel_size = c(5, 5)) %>%
layer_max_pooling_2d(pool_size = c(4, 4)) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(10, activation = "softmax")
cnn %>%
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
Model: "sequential_3"
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 24, 24, 6) 156
max_pooling2d (MaxPooling2D) (None, 6, 6, 6) 0
flatten_3 (Flatten) (None, 216) 0
dense_7 (Dense) (None, 32) 6944
dense_6 (Dense) (None, 10) 330
Total params: 7,430
Trainable params: 7,430
Non-trainable params: 0
# First, we have the input layer which gets the images and the first channel (28, 28, 1)
# then, there is a 2d convolution layer with 6 filters, and a kernel size of 5 (in each direction)
# then, we max-pool the resulting 6 maps to reduce their size by 4 in each direction
# afterwards, we flatten
# then comes a dense hidden layer with 32 units and a relu activation function
# lastly, the output layer is the same as before
cnn %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_cnn <- class_predict(cnn, x = mnist$test$x)
(ctab_cnn <- table(pred = pred_cnn, true = mnist$test$y))
pred 0 1 2 3 4 5 6 7 8 9
0 976 0 2 0 0 2 6 0 1 3
1 0 1128 3 0 0 1 2 2 1 1
2 0 2 1008 1 1 1 1 9 0 4
3 0 2 6 1000 0 5 0 8 3 9
4 0 0 0 0 961 0 4 0 0 3
5 0 1 0 6 0 881 3 0 0 4
6 0 1 1 0 3 1 934 0 1 0
7 1 0 5 0 0 0 0 1005 2 3
8 3 1 7 3 5 1 8 2 966 12
9 0 0 0 0 12 0 0 2 0 970
[1] 0.9829
Here are some things you could do:
- Reduce the convolution filter size & the pooling size and add a second convolutional & pooling layer with double the number of filters
- Add a dropout layer after the flatten layer
- Look up on the internet what works well and implement it!
cnn_2 <-
keras_model_sequential(input_shape = c(28, 28, 1)) %>%
layer_conv_2d(filters = 6, kernel_size = c(3, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 12, kernel_size = c(3, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(10, activation = "softmax")
cnn_2 %>%
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
cnn_2 %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_cnn_2 <- class_predict(cnn_2, x = mnist$test$x)
(ctab_cnn_2 <- table(pred = pred_cnn_2, true = mnist$test$y))
pred 0 1 2 3 4 5 6 7 8 9
0 974 0 2 0 0 2 6 0 2 5
1 0 1121 0 0 0 0 2 0 0 0
2 0 0 1021 1 0 0 0 6 2 0
3 1 0 1 997 0 5 0 1 1 1
4 0 0 1 0 964 0 1 0 0 1
5 0 0 0 5 0 879 4 0 1 3
6 2 3 0 0 3 2 943 0 0 0
7 2 7 3 2 1 2 0 1017 2 3
8 1 4 4 5 2 2 2 1 962 5
9 0 0 0 0 12 0 0 3 4 991
[1] 0.9869