Supervised learning: deep learning

Published

October 9, 2023

Introduction

In this practical, we will create a feed-forward neural network as well as an optional convolutional neural network to analyze the famous MNIST dataset.

library(tidyverse)
library(keras)

Let’s set the seed value and use the same number as below to reproduce the same results.

set.seed(45)

In this section, we will develop a deep feed-forward neural network for MNIST.

Data preparation

1. Load the built-in MNIST dataset by running the following code. Then, describe the structure and contents of the mnist object.

mnist <- dataset_mnist()

2. Plotting is very important when working with image data. We have defined a convenient plotting function for you. Use the plot_img() function below to plot the first training image. The img parameter has to be a matrix with dimensions (28, 28).

Indexing in 3-dimensional arrays works the same as indexing in matrices, but you need an extra comma x[,,].

plot_img <- function(img, col = gray.colors(255, start = 1, end = 0), ...) {
  image(t(img), asp = 1, ylim = c(1.1, -0.1), col = col, bty = "n", axes = FALSE, ...)
}

It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.

3. As a preprocessing step, ensure the brightness values of the images in the training and test set are in the range (0, 1).

Multi-layer perceptron: multinomial logistic regression

The simplest neural network model is a multi-layer perceptron where we have no hidden layers and only input and output layers. We can call this a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1) for our mnist data. That model is shown below.

multinom  <- 
  # initialize a sequential model
  keras_model_sequential(input_shape = c(28, 28)) %>% 
  # flatten 28*28 matrix into single vector
  layer_flatten() %>%
  # softmax outcome == probability for each of 10 outputs
  layer_dense(10, activation = "softmax")

multinom$compile(
  loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
  optimizer = "adam", # we use this optimizer because it works well
  metrics = list("accuracy") # we want to know training accuracy in the end
)

4. Display a summary of the multinomial model using the summary() function. Describe why this model has 7850 parameters.

5. Train the model for 5 epochs using the code below. What accuracy do we obtain in the validation set?

multinom %>% 
  fit(x = mnist$train$x, 
      y = mnist$train$y, 
      epochs = 5, 
      validation_split = 0.2, 
      verbose = 1)

6. Train the model for another 5 epochs. What accuracy do we obtain in the validation set?

Deep feed-forward neural networks

7. Create and compile a feed-forward neural network with the following properties. Ensure that the model has 50890 parameters.

sequential model
flatten layer
dense layer with 64 hidden units and “relu” activation function
dense output layer with 10 units and softmax activation function

You may reuse code from the multinomial model.

8. Train the model for 10 epochs. What do you see in terms of validation accuracy, also compared to the multinomial model?

9. Create predictions for the test data using the two trained models (using the function below). Create a confusion matrix and compute test accuracy for these two models. Write down any observations you have.

class_predict <- function(model, x_train) predict(model, x = x_train) %>% apply(1, which.max) - 1

10. Create and estimate (10 epochs) a deep feed-forward model with the following properties. Compare this model to the previous models on the test data.

sequential model
flatten layer
dense layer with 128 hidden units and “relu” activation function
dense layer with 64 hidden units and “relu” activation function
dense output layer with 10 units and softmax activation function

OPTIONAL: convolutional neural network

Convolution layers in Keras need a specific form of data input.

For each example, they need a (width, height, channels) array (tensor). For a colour image with 28*28 dimension, that shape is usually (28, 28, 3), where the channels indicate red, green, and blue. MNIST has no colour info, but we still need the channel dimension to enter the data into a convolution layer with shape (28, 28, 1). The training dataset x_train should thus have shape (60000, 28, 28, 1).

11. Add a “channel” dimension to the training and test data using the following code. Plot an image using the first channel of the 314th training example (this is a 9).

# add channel dimension to input (required for convolution layers)
dim(mnist$train$x) <- c(dim(mnist$train$x), 1)
dim(mnist$test$x)  <- c(dim(mnist$test$x), 1)

12. Create and compile a convolutional neural network using the following code. Describe the different layers in your own words.

cnn <- 
  keras_model_sequential(input_shape = c(28, 28, 1)) %>% 
  layer_conv_2d(filters = 6, kernel_size = c(5, 5)) %>% 
  layer_max_pooling_2d(pool_size = c(4, 4)) %>%
  layer_flatten() %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(10, activation = "softmax")

cnn %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam", 
    metrics = c("accuracy")
  )

13. Fit this model on the training data (10 epochs) and compare it to the previous models.

14. Create another CNN which has better performance within 10 epochs. Compare your validation or test accuracy to that of your peers.

Here are some things you could do:

Reduce the convolution filter size & the pooling size and add a second convolutional & pooling layer with double the number of filters
Add a dropout layer after the flatten layer
Look up on the internet what works well and implement it!