INFOMDWR – Lab 0: Know Your Tools

Introduction

During this lab, you will get familiar with the most important tools that will be used during the course. These tools include (SQLite), (DB Browser), (Google Colab) for Python and (RStudio) for R.

SQLite and DB Browser

In this course, we will use SQLite and its graphical interface DB Browser to create and query DBMSs. During this lab, you will install and use the DB Browser or SQLite3. As an example for a database, we will use the Chinook database. To download the .sql file, open the Link and look for Chinook_Sqlite.sql. Right click on the link and Save Link As ... to download the file into a location of your choice.

SQLite3

To create a database using sqlite3:

$ sqlite3 Chinook

This command will create a file Chinook.db in the working directory. To check if the database has been created correctly, you can run .databasesfrom the bash of sqlite3 as follows:

sqlite3> .databases 

If the database has been created correctly, you should see the following output

main: /Users/.../Chinook r/w

Now, you can import the tables from the file Chinook_Sqlite.sql using:

sqlite3> .read Chinook_Sqlite.sql

It will take a few seconds to import the database. After that, you can list the tables that exist in the database using:

sqlite3> .tables

If the database was imported correctly, you should see the following list of tables: [Album, Employee, InvoiceLine, PlaylistTrack, Artist, Genre, MediaType, Track, Customer, Invoice, Playlist]

DB Browser

In DB Browser, you can use the graphical interface to create the Chinook database. To import the database from the file Chinook_Sqlite.sql, open the File menu and select import > Database from SQL file .... Change to the location where you saved the file Chinook_Sqlite.sql and select it. You will be asked if you want to create a new databse or not. Since we have created a new database, you select No. If the database is imported correctly, you will see the number of tables changed from 0 to 11.

Now, we need to get familiar with executing a simple SQL query that returns the names of the tables in the database. Unlike using sqlite3, we should use the Execute SQL tool. When selecting the Execute SQL, you will see a text box. Type the query:

SELECT name FROM sqlite_schema
WHERE type ='table' AND 
      name NOT LIKE 'sqlite_%';

After executing the query, you should see a table with the names of the 11 tables in the database.

Google Colab for Python

For simplicity, we will use Google Colab for running the Python programs during the course. If you are familiar with other Integrated Development Environment such as Anaconda, PyCharm or Spyder, then use that IDE. Be careful that specific libraries may run only under a specific version of Python.

Google Colab

We will start by creating a new new notebook and checking the version of Python that is already installed. We run the command:

!python --version

If we need to install a different version of Python (e.g. 3.7), we can use:

!apt-get install python3.7

Installing more Libraries

Most of the important libraries that we may need are already installed in the Google Colab environment. However, if you would like to install additional libraries, you can use the pip command. For example, to install py_stringmatching, you can use the command:

!pip install py_stringmatching

Note: when running system commands in Google colab, we use ! before the command (this is common for running system commands in any notebook environment).

Running simple Python code

Run the following Python code and explain what the code is doing:

t, f = True, False
print(t and f) 
print(t or f)  
print(not t)   

R & RStudio

We will be using the most popular development environment for R: RStudio. In the exercises below (and the readings for this lab from R4DS) you will set up your computer for doing the R practicals in later parts of this course.

R & RStudio

Installing R & RStudio

Install R and RStudio as per the instructions in the syllabus here.

Installing additional packages

Open RStudio, find the console window (the REPL) and type the following code:

2 + 2

This should return 4.

If you are not sure where to execute code, use the following figure to identify the console:

HTML5 Icon

To install packages in R, we use R directly (unlike in python where we commonly use the pip module). Install the tidyverse suite of packages.

install.packages("tidyverse")

If you are asked

Do you want to install from sources the package which needs 
compilation? (Yes/no/cancel)

type no in the console and press the return key.

Did this all go well? Explain to your neighbour what just happened. What is tidyverse?

Required R knowledge

The following is the minimum of what you should know about R before starting with the first R practical later in the course. Take some time to explore these points! Look up things on the internet if you are unsure.

  • What is R (a fancy calculator) and what is an .R file (a recipe for calculations)
  • What is an R package (a set of functions you can download to use in your own code)
  • How to run R code in RStudio
  • What is a variable x <- 10
  • What is a function, e.g., y <- sqrt(x = 23)
  • Understand what the following statements do (tip: you may run it in R line by line)
y <- "What?"
x <- "R!"
z <- paste(x, "Data wrangling and analysis is cool.", y)
rep(z, 3)
1:10
sample(1:20, 4)
sample(1:20, 40, replace = TRUE)
z <- c(1, 2, 3, 4, 5, 4, 3, 2, 1)
z^2
z == 2
z > 2
  • Be able to read the help file of any function, (e.g., type ?plot in the console)