library(tidyverse)
library(keras3)
Supervised learning: deep learning
October 9, 2025
Introduction
In this practical, we will create a feed-forward neural network as well as an optional convolutional neural network to analyze the famous MNIST dataset.
Letβs set the seed value and use the same number as below to reproduce the same results.
In this section, we will develop a deep feed-forward neural network for MNIST.
Data preparation
mnist
object.
List of 2
$ train:List of 2
..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
$ test :List of 2
..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...
[1] 0 255
0 1 2 3 4 5 6 7 8 9
5923 6742 5958 6131 5842 5421 5918 6265 5851 5949
plot_img()
function below to plot the first training image. The img
parameter has to be a matrix with dimensions (28, 28)
.
Indexing in 3-dimensional arrays works the same as indexing in matrices, but you need an extra comma x[,,]
.
It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.
Multi-layer perceptron: multinomial logistic regression
The simplest neural network model is a multi-layer perceptron where we have no hidden layers and only input and output layers. We can call this a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1) for our mnist data. That model is shown below.
multinom <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) |>
# flatten 28*28 matrix into single vector
layer_flatten() |>
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
multinom$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
summary()
function. Describe why this model has 7850 parameters.
Model: "sequential"
βββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββ
β Layer (type) β Output Shape β Param #
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β flatten (Flatten) β (None, 784) β 0
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β dense (Dense) β (None, 10) β 7,850
βββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββ
Total params: 7,850 (30.66 KB)
Trainable params: 7,850 (30.66 KB)
Non-trainable params: 0 (0.00 B)
Deep feed-forward neural networks
- sequential model
- flatten layer
- dense layer with 64 hidden units and βreluβ activation function
- dense output layer with 10 units and softmax activation function
You may reuse code from the multinomial model.
ffnn <-
# initialize a sequential model
keras_model_sequential(input_shape = c(28, 28)) |>
# flatten 28*28 matrix into single vector
layer_flatten() |>
# this is the hidden layer!
layer_dense(64, activation = "relu") |>
# softmax outcome == probability for each of 10 outputs
layer_dense(10, activation = "softmax")
ffnn$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
summary(ffnn)
Model: "sequential_1"
βββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββ
β Layer (type) β Output Shape β Param #
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β flatten_1 (Flatten) β (None, 784) β 0
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β dense_1 (Dense) β (None, 64) β 50,240
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β dense_2 (Dense) β (None, 10) β 650
βββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββ
Total params: 50,890 (198.79 KB)
Trainable params: 50,890 (198.79 KB)
Non-trainable params: 0 (0.00 B)
313/313 - 0s - 739us/step
313/313 - 0s - 858us/step
true
pred 0 1 2 3 4 5 6 7 8 9
0 949 0 3 1 1 6 7 1 7 10
1 0 1115 8 0 1 2 3 6 7 7
2 3 3 935 20 6 4 9 19 9 1
3 1 2 17 925 1 38 1 8 25 13
4 0 0 10 0 916 10 7 5 9 23
5 7 1 4 18 0 770 10 0 26 5
6 14 4 13 2 9 16 917 0 9 0
7 5 2 12 14 7 10 2 962 15 34
8 1 8 27 21 6 29 2 2 857 3
9 0 0 3 9 35 7 0 25 10 913
true
pred 0 1 2 3 4 5 6 7 8 9
0 967 1 4 1 2 2 4 1 7 3
1 1 1118 0 0 0 0 3 3 1 4
2 0 5 1015 8 1 1 4 14 9 1
3 1 4 0 986 1 16 3 1 6 5
4 1 0 1 0 960 1 3 0 4 9
5 2 1 0 5 0 864 8 1 6 6
6 4 1 2 0 4 2 931 0 2 0
7 1 2 6 6 2 0 0 1004 3 6
8 3 3 4 4 1 6 2 0 932 11
9 0 0 0 0 11 0 0 4 4 964
[1] 0.9259
[1] 0.9741
- sequential model
- flatten layer
- dense layer with 128 hidden units and βreluβ activation function
- dense layer with 64 hidden units and βreluβ activation function
- dense output layer with 10 units and softmax activation function
dffnn <-
keras_model_sequential(input_shape = c(28, 28)) |> # initialize a sequential model
layer_flatten() |> # flatten 28*28 matrix into single vector
layer_dense(128, activation = "relu") |> # this is the hidden layer!
layer_dense(64, activation = "relu") |> # this is the hidden layer!
layer_dense(10, activation = "softmax") # softmax outcome == logistic regression for each of 10 outputs
dffnn$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
summary(dffnn)
dffnn |> fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
313/313 - 0s - 950us/step
true
pred 0 1 2 3 4 5 6 7 8 9
0 969 0 2 0 0 3 3 0 3 0
1 1 1124 0 1 1 0 3 6 3 4
2 0 2 999 3 1 0 0 8 2 0
3 0 2 7 966 0 2 0 7 2 5
4 0 0 4 1 967 2 3 4 1 11
5 0 0 0 17 0 866 2 0 0 7
6 3 5 8 0 6 12 945 1 5 0
7 1 0 4 1 0 1 0 986 2 3
8 4 2 8 15 1 4 1 4 951 2
9 2 0 0 6 6 2 1 12 5 977
[1] 0.975
OPTIONAL: convolutional neural network
Convolution layers in Keras need a specific form of data input.
For each example, they need a (width, height, channels)
array (tensor). For a colour image with 28*28 dimension, that shape is usually (28, 28, 3)
, where the channels indicate red, green, and blue. MNIST has no colour info, but we still need the channel dimension to enter the data into a convolution layer with shape (28, 28, 1)
. The training dataset x_train
should thus have shape (60000, 28, 28, 1)
.
cnn <-
keras_model_sequential(input_shape = c(28, 28, 1)) |>
layer_conv_2d(filters = 6, kernel_size = c(5, 5)) |>
layer_max_pooling_2d(pool_size = c(4, 4)) |>
layer_flatten() |>
layer_dense(units = 32, activation = "relu") |>
layer_dense(10, activation = "softmax")
cnn |>
compile(
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
Model: "sequential_3"
βββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββ
β Layer (type) β Output Shape β Param #
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β conv2d (Conv2D) β (None, 24, 24, 6) β 156
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β max_pooling2d (MaxPooling2D) β (None, 6, 6, 6) β 0
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β flatten_3 (Flatten) β (None, 216) β 0
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β dense_6 (Dense) β (None, 32) β 6,944
βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββ
β dense_7 (Dense) β (None, 10) β 330
βββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββ
Total params: 7,430 (29.02 KB)
Trainable params: 7,430 (29.02 KB)
Non-trainable params: 0 (0.00 B)
# First, we have the input layer which gets the images and the first channel (28, 28, 1)
# then, there is a 2d convolution layer with 6 filters, and a kernel size of 5 (in each direction)
# then, we max-pool the resulting 6 maps to reduce their size by 4 in each direction
# afterwards, we flatten
# then comes a dense hidden layer with 32 units and a relu activation function
# lastly, the output layer is the same as before
313/313 - 0s - 1ms/step
true
pred 0 1 2 3 4 5 6 7 8 9
0 978 0 2 1 1 2 6 0 12 3
1 0 1128 4 0 0 0 5 2 1 3
2 0 2 1007 1 1 1 0 8 4 1
3 0 1 9 1005 0 12 0 3 10 3
4 0 2 2 0 973 0 5 4 7 10
5 0 0 0 0 0 865 1 0 6 2
6 0 2 0 0 1 8 941 0 4 0
7 1 0 4 1 1 1 0 1005 6 5
8 1 0 3 2 0 1 0 2 916 1
9 0 0 1 0 5 2 0 4 8 981
[1] 0.9799
Here are some things you could do:
- Reduce the convolution filter size & the pooling size and add a second convolutional & pooling layer with double the number of filters
- Add a dropout layer after the flatten layer
- Look up on the internet what works well and implement it!
cnn_2 <-
keras_model_sequential(input_shape = c(28, 28, 1)) |>
layer_conv_2d(filters = 6, kernel_size = c(3, 3)) |>
layer_max_pooling_2d(pool_size = c(2, 2)) |>
layer_conv_2d(filters = 12, kernel_size = c(3, 3)) |>
layer_max_pooling_2d(pool_size = c(2, 2)) |>
layer_flatten() |>
layer_dropout(rate = 0.2) |>
layer_dense(units = 32, activation = "relu") |>
layer_dense(10, activation = "softmax")
cnn_2 |>
compile(
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
cnn_2 |> fit(x = mnist$train$x, y = mnist$train$y, epochs = 10, validation_split = 0.2, verbose = 1)
313/313 - 0s - 1ms/step
true
pred 0 1 2 3 4 5 6 7 8 9
0 973 1 4 2 0 2 6 1 2 3
1 0 1104 1 0 0 0 1 1 0 1
2 1 2 1009 1 4 1 0 14 0 0
3 0 5 5 1000 0 3 1 3 2 1
4 0 0 1 0 972 0 5 1 1 8
5 0 1 0 3 0 879 4 0 1 2
6 1 4 0 0 0 2 941 0 1 0
7 0 1 4 1 0 1 0 999 2 1
8 4 17 8 3 0 4 0 3 963 4
9 1 0 0 0 6 0 0 6 2 989
[1] 0.9829