A Tutorial Introduction to Relational Databases
Table of Contents
These are the notes from my guest lecture at Karnavati University on databases. It gives a taste of using relational databases to organize real-world data and using Structured Query Language (SQL) to extract complex information.
We will build a database for an application like twitter. It will have users, tweets, replies, followers, and followees.
1. Setup
sqlite is an implementation of SQL databases that stores databases
in files. It is used for local application databases more than web
applications. It is a nice tool to explore SQL databases without too
much initial setup. We can create a file to hold the twitter database
using the command:
sqlite3 twitter.db
This will drop you into the sqlite prompt where we can write SQL to
operate on the database.
First, we enable some options on the sqlite prompt to get nice
output:
sqlite> .mode box sqlite> .headers on
These have no effect on the generated file.
2. User Accounts
We need a table of users. We will store the handle, name, and bio. The real database will also need data to authenticate the user.
create table users ( handle text primary key , name text , bio text );
We specify the type of data in each column. text is any string. The
primary key directive specifies that handle column cannot have
duplicate values. In other words, rows in this table are identified by
handle.
We can now create a couple of user accounts:
insert into users (handle, name, bio) values
('joe', 'Joe Root', 'I am English.'),
('steve', 'Steve Smith', 'Hi!'),
('virat', 'Virat Kohli', 'Hello!');
We can observe the table using select.
select * from users;
| handle | name | bio |
|---|---|---|
| joe | Joe Root | I am English. |
| steve | Steve Smith | Hi! |
| virat | Virat Kohli | Hello! |
The handle column is special. There cannot be another user with
handle virat. But there can be another user with name Virat Kohli.
3. Tweets
Users should be able to tweet. We will use another table to hold tweets.
create table tweets ( id integer primary key autoincrement , tweet text , tweeter text , datetime text , foreign key (tweeter) references users(handle) );
There are no unique natural columns in the table of tweets. So we add
a column id just to distinguish rows. Since we do not care about the
exact values, we set it to an integer that is auto-incremented. Each
time a row is inserted, sqlite will generate a new value for the
id column.
The foreign key declaration states the entries in the tweeter
column must come from the handle column of users. In other words,
only registered users can make tweets.
Let us make some tweets.
insert into tweets (tweet, tweeter, datetime) values
('Happy New Year!', 'joe', '20250101T0000'),
('Happy New Year to you too!', 'steve', '20250101T0100'),
('Happy New Year to you too!', 'virat', '20250101T0600'),
('Too late!', 'steve', '20250101T0700');
We can take a look at the existing tweets.
select * from tweets;
| id | tweet | tweeter | datetime |
|---|---|---|---|
| 1 | Happy New Year! | joe | 20250101T0000 |
| 2 | Happy New Year to you too! | steve | 20250101T0100 |
| 3 | Happy New Year to you too! | virat | 20250101T0600 |
| 4 | Too late! | steve | 20250101T0700 |
Note that the id field is auto-incremented for each tweet.
4. Replies
In twitter, a tweet may be a reply to another tweet. This information is not present in this table. We can create another table to capture this relationship.
create table replies ( original integer , reply integer primary key , foreign key (original) references tweets(id) , foreign key (reply) references tweets(id) );
The reply column is a primary key because a tweet can be a reply
to at most one tweet. A tweet is top-level if it does not appear in
the reply column. We identify tweets using their id.
We want Virat and Steve to reply to Joe, and Steve's last tweet to be a reply to Virat's:
insert into replies (original, reply) values (1, 2), (1, 3), (3, 4);
Now, let us use some queries to extract data that twitter wants for building its pages.
We can see all tweets by Steve:
select tweet, datetime from tweets where tweeter = 'steve';
| tweet | datetime |
|---|---|
| Happy New Year to you too! | 20250101T0100 |
| Too late! | 20250101T0700 |
To see all replies to a tweet, we need to combine information in
tweets and replies. The join operation in SQL achieves
this. First, we see the most general join, the cross join.
select * from tweets cross join replies;
| id | tweet | tweeter | datetime | original | reply |
|---|---|---|---|---|---|
| 1 | Happy New Year! | joe | 20250101T0000 | 1 | 2 |
| 1 | Happy New Year! | joe | 20250101T0000 | 1 | 3 |
| 1 | Happy New Year! | joe | 20250101T0000 | 3 | 4 |
| 2 | Happy New Year to you too! | steve | 20250101T0100 | 1 | 2 |
| 2 | Happy New Year to you too! | steve | 20250101T0100 | 1 | 3 |
| 2 | Happy New Year to you too! | steve | 20250101T0100 | 3 | 4 |
| 3 | Happy New Year to you too! | virat | 20250101T0600 | 1 | 2 |
| 3 | Happy New Year to you too! | virat | 20250101T0600 | 1 | 3 |
| 3 | Happy New Year to you too! | virat | 20250101T0600 | 3 | 4 |
| 4 | Too late! | steve | 20250101T0700 | 1 | 2 |
| 4 | Too late! | steve | 20250101T0700 | 1 | 3 |
| 4 | Too late! | steve | 20250101T0700 | 3 | 4 |
A cross join operation builds a table by combining all rows of the
first table with all rows of the second table. This is the set cross
product for tables. To extract replies to tweet 1, we have to select
rows from the product table where reply = id and original = 1.
select tweet, tweeter, datetime from tweets t cross join replies r where t.id = r.reply and r.original = 1;
| tweet | tweeter | datetime |
|---|---|---|
| Happy New Year to you too! | steve | 20250101T0100 |
| Happy New Year to you too! | virat | 20250101T0600 |
Note that we assigned short names t and r to rows of tweets and
replies and we can use dot notation to refer to columns of those
rows.
An inner join does this more efficiently by not creating the whole
table, but by only creating the entries satisfying the on condition.
select tweet, tweeter, datetime from tweets t inner join replies r on r.reply = t.id where r.original = 1;
| tweet | tweeter | datetime |
|---|---|---|
| Happy New Year to you too! | steve | 20250101T0100 |
| Happy New Year to you too! | virat | 20250101T0600 |
5. Followers
On twitter, users can follow another user.
create table followers ( follower text , followee text , foreign key (follower) references users(handle) , foreign key (followee) references users(handle) , primary key (follower, followee) );
There are no unique columns in this table. A user can have multiple followers and can be followed by multiple users. But taken together, the two columns cannot have the same value. So the primary key is composite.
Let us say, except for Steve, who does not follow Joe, everyone follows everyone else.
insert into followers (follower, followee) values
('joe', 'virat'),
('joe', 'steve'),
('virat', 'joe'),
('virat', 'steve'),
('steve', 'virat');
We may want to see all replies from follower of a tweeter to a particular tweet by that tweeter. As far as I know, twitter does not have such a feature. However, we can write a query by joining three tables.
select rt.tweet, rt.tweeter, rt.datetime from tweets t join replies r on r.original = t.id join tweets rt on rt.id = r.reply join followers f on f.follower = rt.tweeter and f.followee = t.tweeter where t.id = 1;
| tweet | tweeter | datetime |
|---|---|---|
| Happy New Year to you too! | virat | 20250101T0600 |
Note that join defaults to inner join in SQL. Steve's reply is
excluded because he doesn't follow Joe.
6. Conclusion
We have seen how a backend for a simple twitter-like application can be organized in a relational database.