Intro to PostGreSQL

When we're working with apps we often want somewhere to store our data. There's a number of choices here, depending on what our needs are. For example, we might be fine with keeping our data as some files on a disk, or we might need something that quickly and easily translates into usable objects in our code.

However, a lot of the time in web apps we want data which is structured, able to be searched efficiently and easy to analyse. In this case, we probably want a SQL database. SQL databases are the most common type of database and are suitable for many use cases.

SQL databases are tabular, which means they represent our data as a set of tables which can be combined, filtered, and searched. If you've ever used a spreadsheet program then you'll be very familiar with this concept. In some ways, SQL databases can be thought of as the spreadsheet concept but on a much larger scale.

PostGreSQL is one of the most popular SQL database systems, and has become the de-facto standard in the last few years. Alternatives include MySQL, SQLite, Microsoft SQL Server, and MariaDB.

In this class we're going to dive right into PostGreSQL in order to explore the tabular concept and get an understanding of basic SQL operations.

Installation

First, download Virtualbox from their website and install it.

Then, download the VM for this database course and add it to Virtualbox.

user: pg
password: pguser

admin: mtt
password: milktea

In this part of the course we're going to use psql to access our database. This is a command-line tool which lets us run SQL queries and explore the database.

After logging into the VM, run the following command to connect to the PostGreSQL database.

psql -d mypgdatabase

We're connecting to our local PostGreSQL instance on the mypgdatabase, using default credentials. You can see those credentials by running:

cat ~/.pg_pass

Getting Started

A PostGreSQL instance consists of one or several databases, which contain tables. Databases are collections of multiple tables joined by relationships. Tables are where our actual data is stored, and are similar to a spreadsheet in a lot of ways.

In these classes we're going to use the Pagila database. This is a collection of tables which simulate the business systems for a movie rental company like Blockbuster.

Let's check our tables. You can view a list of tables in psql by entering \dt at the command line. You should see a number of movie-related tables such as Actor and Film.

Now try zooming in on some specific table. You can use \d+ TABLE_NAME for this. This will give you the schema for a particular table, which tells you all of the columns in the table and any constraints or rules governing the table's behaviour.

Notice that every column has a type. This includes types you'll be familiar with from programming, but PostGreSQL also supports less common types like JSON or DateTime. For a complete list see here.

SELECT

Now let's look at the actual data in our tables. To do this we're going to run the SELECT query.

Note that by convention, SQL keywords like SELECT are in uppercase, but they do not need to be written this way.

SELECT chooses one or more columns from a table and outputs them. We can either choose to select the whole table:

SELECT * FROM table_name;

Or we can choose to select one or more columns from a given table:

SELECT column_1, column_2 FROM table_name;

Exercises

  • Go through actor, address, city, country, customer, film, inventory, rental, staff, and store and take a look at their tables.
  • Pick 2 or 3 tables of interest and look at them.

SELECT WHERE

Usually we don't want to retrieve every record in a table, so we'll filter our table down by some attribute. The most common way to do this is with WHERE.

WHERE lets us specify a condition against a column of our table and use it to filter the table:

SELECT * FROM table_name WHERE id > 0;
SELECT column_1, column_2 FROM table_name WHERE id < 10;

We can combine WHERE statements by using AND and OR keywords, just like most programming languages.

SELECT column_1, column_2 FROM table_name WHERE id < 10 AND id > 5;

Exercises

  • How many Theos do we have on staff?
  • What are their surnames?
  • How many inactive customers do we have?
  • How mant customers do we have with the surname "Smith"?

ORDER BY / LIMIT / OFFSET

We can sort records by using ORDER BY. This can be ASC (ascending) or DESC (descending), but you need to choose one or more columns.

SELECT first_name, last_name FROM employees ORDER BY last_name ASC, first_name;

Sometimes, especially when iterating through a database from some external app, we don't want all our records at once. We can use LIMIT for this.

SELECT first_name, last_name from employees LIMIT 10;

If we want different 'pages' of results, then we can use OFFSET to progress through our results. This is very useful when combined with LIMIT.

-- get a 3rd page of 10 employees
SELECT first_name, last_name from employees LIMIT 10 OFFSET 20;

Exercises

  • Sort actors by first name.
  • Sort the countries in reverse order, starting from Zambia.
  • Find the 10 films which cost the most and least to rent.
  • Find me the 10th-20th actors in alphabetical order by surname.

Aggregations

One of the main reasons we want to work with databases is usually to make it easier to analyse large amounts of data. The main way we can do this in SQL is with aggregations.

Common aggregations are to get the maximum, minimum or average value from some list.

SELECT max(price) FROM products WHERE type="laptop";
SELECT min(price) FROM products WHERE type="laptop";
SELECT avg(price) FROM products WHERE type="laptop";

A common thing we might want to do is get a list of distinct values from some table (i.e. values with no repetitions). We can do this with DISTINCT column_name.

SELECT DISTINCT last_name FROM employees;

We also might want to run an aggregation across different groups of values. For this GROUP BY is what we want:

SELECT type, avg(price) FROM products GROUP BY type;

Exercises

  • What's the average rental rate for a movie? What's the highest? What's the lowest?
  • How many distinct first names do our actors have?
  • What are the most common 5 districts for our stores (in table address) if we list them in alphabetical order?
  • What are the most common last names in our staff?

Further Reading