```
library(tidyverse)
library(keras)
```

# Supervised learning: deep learning

October 9, 2023

# Introduction

In this practical, we will create a feed-forward neural network as well as an optional convolutional neural network to analyze the famous MNIST dataset.

Let’s set the seed value and use the same number as below to reproduce the same results.

In this section, we will develop a deep feed-forward neural network for MNIST.

# Data preparation

`mnist`

object.
```
List of 2
$ train:List of 2
..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
$ test :List of 2
..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...
```

`[1] 0 255`

```
0 1 2 3 4 5 6 7 8 9
5923 6742 5958 6131 5842 5421 5918 6265 5851 5949
```

`plot_img()`

function below to plot the first training image. The `img`

parameter has to be a matrix with dimensions `(28, 28)`

.
Indexing in 3-dimensional arrays works the same as indexing in matrices, but you need an extra comma `x[,,]`

.

It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.

# Multi-layer perceptron: multinomial logistic regression

The simplest neural network model is a multi-layer perceptron where we have no hidden layers and only input and output layers. We can call this a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1) for our mnist data. That model is shown below.

```
multinom <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) %>%
# flatten 28*28 matrix into single vector
layer_flatten() %>%
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
multinom$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
```

`summary()`

function. Describe why this model has 7850 parameters.
```
Model: "sequential"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
flatten (Flatten) (None, 784) 0
dense (Dense) (None, 10) 7850
================================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
________________________________________________________________________________
```

# Deep feed-forward neural networks

- sequential model
- flatten layer
- dense layer with 64 hidden units and “relu” activation function
- dense output layer with 10 units and softmax activation function

You may reuse code from the multinomial model.

```
ffnn <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) %>%
# flatten 28*28 matrix into single vector
layer_flatten() %>%
# this is the hidden layer!
layer_dense(64, activation = "relu") %>%
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
ffnn$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
summary(ffnn)
```

```
Model: "sequential_1"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
flatten_1 (Flatten) (None, 784) 0
dense_2 (Dense) (None, 64) 50240
dense_1 (Dense) (None, 10) 650
================================================================================
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
________________________________________________________________________________
```

```
pred_multinom <- class_predict(multinom, x = mnist$test$x)
pred_ffnn <- class_predict(ffnn, x = mnist$test$x)
(ctab_multinom <- table(pred = pred_multinom, true = mnist$test$y))
```

```
true
pred 0 1 2 3 4 5 6 7 8 9
0 964 0 7 3 2 8 13 1 7 11
1 0 1113 11 0 1 3 3 6 8 7
2 1 3 918 18 5 3 8 20 6 1
3 2 3 21 929 2 37 1 12 26 11
4 0 0 9 0 909 6 7 7 8 24
5 5 1 4 19 0 776 13 1 21 7
6 5 3 12 2 10 14 910 0 7 0
7 2 2 8 9 4 8 1 941 10 16
8 1 10 38 21 10 31 2 2 871 6
9 0 0 4 9 39 6 0 38 10 926
```

```
true
pred 0 1 2 3 4 5 6 7 8 9
0 969 0 5 0 1 3 6 1 2 3
1 0 1122 2 0 0 0 4 4 0 3
2 1 3 997 4 4 0 1 5 2 1
3 2 3 4 975 0 5 1 2 7 4
4 1 0 4 0 952 0 5 0 6 9
5 2 0 1 15 0 876 11 0 3 3
6 2 2 2 0 3 1 926 0 1 1
7 1 1 5 4 4 1 1 1004 4 4
8 1 4 11 7 1 4 3 3 947 0
9 1 0 1 5 17 2 0 9 2 981
```

`[1] 0.9257`

`[1] 0.9749`

- sequential model
- flatten layer
- dense layer with 128 hidden units and “relu” activation function
- dense layer with 64 hidden units and “relu” activation function
- dense output layer with 10 units and softmax activation function

```
dffnn <-
keras_model_sequential(input_shape = c(28, 28)) %>% # initialize a sequential model
layer_flatten() %>% # flatten 28*28 matrix into single vector
layer_dense(128, activation = "relu") %>% # this is the hidden layer!
layer_dense(64, activation = "relu") %>% # this is the hidden layer!
layer_dense(10, activation = "softmax") # softmax outcome == logistic regression for each of 10 outputs
dffnn$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
summary(dffnn)
```

```
Model: "sequential_2"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
flatten_2 (Flatten) (None, 784) 0
dense_5 (Dense) (None, 128) 100480
dense_4 (Dense) (None, 64) 8256
dense_3 (Dense) (None, 10) 650
================================================================================
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
________________________________________________________________________________
```

```
dffnn %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_dffnn <- class_predict(dffnn, x = mnist$test$x)
(ctab_dffnn <- table(pred = pred_dffnn, true = mnist$test$y))
```

```
true
pred 0 1 2 3 4 5 6 7 8 9
0 973 2 2 0 1 6 6 2 1 3
1 0 1128 3 1 1 0 2 4 0 7
2 1 0 998 3 5 0 2 8 5 1
3 0 1 12 985 0 8 1 6 16 9
4 1 0 1 2 962 0 5 0 5 16
5 1 0 0 11 0 868 5 0 5 5
6 1 2 4 0 3 6 936 0 3 1
7 1 1 3 5 2 0 0 1002 3 9
8 2 1 8 1 4 3 1 4 933 4
9 0 0 1 2 4 1 0 2 3 954
```

`[1] 0.9739`

# OPTIONAL: convolutional neural network

Convolution layers in Keras need a specific form of data input.

For each example, they need a `(width, height, channels)`

array (tensor). For a colour image with 28*28 dimension, that shape is usually `(28, 28, 3)`

, where the channels indicate red, green, and blue. MNIST has no colour info, but we still need the channel dimension to enter the data into a convolution layer with shape `(28, 28, 1)`

. The training dataset `x_train`

should thus have shape `(60000, 28, 28, 1)`

.

```
cnn <-
keras_model_sequential(input_shape = c(28, 28, 1)) %>%
layer_conv_2d(filters = 6, kernel_size = c(5, 5)) %>%
layer_max_pooling_2d(pool_size = c(4, 4)) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(10, activation = "softmax")
cnn %>%
compile(
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
```

```
Model: "sequential_3"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
conv2d (Conv2D) (None, 24, 24, 6) 156
max_pooling2d (MaxPooling2D) (None, 6, 6, 6) 0
flatten_3 (Flatten) (None, 216) 0
dense_7 (Dense) (None, 32) 6944
dense_6 (Dense) (None, 10) 330
================================================================================
Total params: 7,430
Trainable params: 7,430
Non-trainable params: 0
________________________________________________________________________________
```

```
# First, we have the input layer which gets the images and the first channel (28, 28, 1)
# then, there is a 2d convolution layer with 6 filters, and a kernel size of 5 (in each direction)
# then, we max-pool the resulting 6 maps to reduce their size by 4 in each direction
# afterwards, we flatten
# then comes a dense hidden layer with 32 units and a relu activation function
# lastly, the output layer is the same as before
```

```
cnn %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_cnn <- class_predict(cnn, x = mnist$test$x)
(ctab_cnn <- table(pred = pred_cnn, true = mnist$test$y))
```

```
true
pred 0 1 2 3 4 5 6 7 8 9
0 976 0 2 0 0 2 6 0 1 3
1 0 1128 3 0 0 1 2 2 1 1
2 0 2 1008 1 1 1 1 9 0 4
3 0 2 6 1000 0 5 0 8 3 9
4 0 0 0 0 961 0 4 0 0 3
5 0 1 0 6 0 881 3 0 0 4
6 0 1 1 0 3 1 934 0 1 0
7 1 0 5 0 0 0 0 1005 2 3
8 3 1 7 3 5 1 8 2 966 12
9 0 0 0 0 12 0 0 2 0 970
```

`[1] 0.9829`

Here are some things you could do:

- Reduce the convolution filter size & the pooling size and add a second convolutional & pooling layer with double the number of filters
- Add a dropout layer after the flatten layer
- Look up on the internet what works well and implement it!

```
cnn_2 <-
keras_model_sequential(input_shape = c(28, 28, 1)) %>%
layer_conv_2d(filters = 6, kernel_size = c(3, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 12, kernel_size = c(3, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(10, activation = "softmax")
cnn_2 %>%
compile(
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
cnn_2 %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
pred_cnn_2 <- class_predict(cnn_2, x = mnist$test$x)
(ctab_cnn_2 <- table(pred = pred_cnn_2, true = mnist$test$y))
```

```
true
pred 0 1 2 3 4 5 6 7 8 9
0 974 0 2 0 0 2 6 0 2 5
1 0 1121 0 0 0 0 2 0 0 0
2 0 0 1021 1 0 0 0 6 2 0
3 1 0 1 997 0 5 0 1 1 1
4 0 0 1 0 964 0 1 0 0 1
5 0 0 0 5 0 879 4 0 1 3
6 2 3 0 0 3 2 943 0 0 0
7 2 7 3 2 1 2 0 1017 2 3
8 1 4 4 5 2 2 2 1 962 5
9 0 0 0 0 12 0 0 3 4 991
```

`[1] 0.9869`