```
library(mclust)
library(tidyverse)
library(patchwork)
```

# Model-based clustering using mclust

Published

October 20, 2023

# Introduction

In this practical, we will apply model-based clustering on a data set of bank note measurements.

We use the following packages:

The data is built into the `mclust`

package and can be loaded as a `tibble`

by running the following code:

# Data exploration

2. Create a scatter plot of the left (x-axis) and right (y-axis) measurements on the data set. Map the

`Status`

column to colour. Jitter the points to avoid overplotting. Are the classes easy to distinguish based on these features?
3. From now on, we will assume that we don’t have the labels. Remove the

`Status`

column from the data set.
4. Create density plots for all columns in the data set. Which single feature is likely to be best for clustering?

```
# big patchwork of density plots
df |> ggplot(aes(x = Length)) + geom_density() + theme_minimal() +
df |> ggplot(aes(x = Left)) + geom_density() + theme_minimal() +
df |> ggplot(aes(x = Right)) + geom_density() + theme_minimal() +
df |> ggplot(aes(x = Bottom)) + geom_density() + theme_minimal() +
df |> ggplot(aes(x = Top)) + geom_density() + theme_minimal() +
df |> ggplot(aes(x = Diagonal)) + geom_density() + theme_minimal()
```

```
# the Diagonal feature looks good! Look at the two bumps in its density plot.
# Colourful alternative:
library(ggridges)
df |>
mutate(across(everything(), scale)) |>
pivot_longer(everything(), names_to = "feature", values_to = "value") |>
ggplot(aes(x = value, y = feature, fill = feature)) +
geom_density_ridges() +
scale_fill_viridis_d(guide = "none") +
theme_minimal()
```

`Picking joint bandwidth of 0.31`

# Univariate model-based clustering

5. Use

`Mclust`

to perform model-based clustering with 2 clusters on the feature you chose. Assume equal variances. Name the model object `fit_E_2`

. What are the means and variances of the clusters?
6. Use the formula from the slides and the model’s log-likelihood (

`fit_E_2$loglik`

) to compute the BIC for this model. Compare it to the BIC stored in the model object (`fit_E_2$bic`

). Explain how many parameters (m) you used and which parameters these are.
7. Plot the model-implied density using the

`plot()`

function. Afterwards, add rug marks of the original data to the plot using the `rug()`

function from the base graphics system.