Engine Types#

There are many options available to us, but we're going to focus on just two: relational and NoSQL. These two databases are going to make up most of those you'll interact with.

There's a lot of complexity that goes into a database engine, but we're not going to explore too deeply into that. That's very much a computer science topic that's outside the scope of this course (and Level Two and Level Three) and it's also not going to help you do your job any better.

Instead, we'll explore each database type and explore their features, then compare the two.

Relational Databases#

Let's start with the relational database.

This database type is probably the most common in the world. You've likely used one yourself if you've ever used a spreadsheet as they're technically a database: they have rows, columns and a sheet represents a table, making the rows and columns related to each other.

In a relational database you have rows and columns, and a table relates them to each other. Visually this looks like this:

Relational Database Terms
(User:Booyabazooka, Public domain, via Wikimedia Commons)

The "Attribute" is referred to as a column; the "Tuple" is the row; and the "Relation" is a table. Looks a bit like a spreadsheet, doesn't it?

A database engine "wraps" tables into a database, which can have zero or more tables. A database engine can also host multiple databases.

The term "relational" also comes from the fact that relationships can be setup between the data inside of the database(s). For example, it's possible to setup a "foreign key constraint" between two tables and have data from one table update or delete in the event the data in another is updated or deleted (this is known as cascading). This isn't something we need to delve into too deeply because this kind of work is generally left to database administrators and software engineers.

Engines#

Relational database engines include:

PostgreSQL
MySQL
MariaDB (a fork of MySQL)
Microsoft SQL Server
Oracle
SQLite

And many more.

Note

A spreadsheet isn't really a database in the sense we're talking about, but they do make for an easy way to introduce you to the rows, columns, and table concept. I don't recommend storing critical or sensitive information in a spreadsheet if it can be helped.

Schemas#

One thing that relational databases have, or more precisely, the tables have, are schemas. A schema is the design or the "structure" of a table. It defines what the table looks like in terms of the columns, their name, data type, and other attributes. Here's an example table schema:

CREATE TABLE contacts (
    contact_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    phone TEXT NOT NULL UNIQUE
);

This is SQL, and this specific SQL was written for the SQLite engine - something we will explore later. Don't be too concerned about how the SQL syntax for now. We'll cover that later.

Breaking this SQL down, we can see we're creating a table (CREATE TABLE) called "contacts" and we're defining five columns:

contact_id
first_name
last_name
email
phone

The contact_id column's data type is INTEGER. This means it's a number. It's also the PRIMARY KEY, which means it's not only an index on the table used for looking data up quickly, but also represents the whole row within the table and acts as its "uniqueness" when compared to other rows.

All the other columns are of type TEXT, so "strings of data like this" (between the "s). They have other "constraints" on them too, like NOT NULL (cannot be an empty value), UNIQUE meaning the value cannot repeat in other rows, etc.

This is what we call a schema, and it defines the structure of the table that must be complied with when inserting data into the table. We'll see examples of this later when we study the basics of SQL.

Indexes and Primary Keys#

I mentioned two key words above: index and primary key. Let's explore this in a bit more depth.

An index is defined on the table schema and is used for looking up rows inside of the table very quickly. A poorly indexed table can result in slow application performance as more and more data is added to it. You're not going to be directly involved in designing and setting up the indexes on a table, but it's important you know what they are.

I also mentioned a primary key. A primary key can be made up of multiple columns in a table's schema. They act as a way of uniquely identifying rows inside of the table. Even though you can use the UNIQURE constraint in the schema to make sure that rows have a unique value for a given column (like the email column above), that's not the same thing. With a primary key, the "uniqueness" that's created is used by the database engine itself to identify individual records. Like, like an index, helps to speed up the searching capabilities of a database engine.

ACID#

A common trait amongst relational database types is ACID:

A: Atomic
C: Consistent
I: Isolation
D: Durable

For a relational database to be considered "atomic", it must ensure that data is completely written to the database during a transaction, and either completely succeeds or completely fails - nothing in between. No "half" writes of the data to the database: it either wrote the data or it did not. This helps to guarantee the integrity of the data.

To be consistent means to provide the correct data after a record has been updated. Once the change has happened, the new data is returned when queried, and the old has been truly replaced.

Isolation refers to the concept that all users, and their transactions against data, should be isolated and cannot "see" each other or effect each other's operations. Everyone acts the database and the data in isolation. Don't confuse this with the idea that changes to the data aren't visible to other users - they are. Instead, isolation here is saying that one user's queries cannot affect another user's queries directly during their execution.

And to be durable means to ensure changes to data, once written to the database, are written to disk and are permanent and persistent. The data cannot be lost five minutes later, otherwise the database engine isn't durable.

NoSQL Databases#

Unlike relational databases, NoSQL database engines operate under the principle that none of the data is related and the schema is flexible and isn't defined ahead of time.

Generally speaking, SQL also isn't used to query a NoSQL database, hence the name.

NoSQL databases do have advantages over traditional, relational, SQL based engines: they can be much faster, and their flexibility means if you data changes "shape", you don't have to do schema updates, which can be very complex and in some cases, too difficult to consider.

Engines#

Example of NoSQL engines include:

MongoDB
Cassandra
Redis
HBase
Couchbase
Elastic

And more.

Types#

NoSQL database engines can be split into multi sub-categories based on how they store data:

Key-value
Graph
Document
Column-orientated (wide column)

We don't need to delve too deeply into these topics today. That's what the Level Two and Level Three courses are for.

(No) Schema#

Having no schema when storing data into a database means if your data looks one way today, but then a new attribute or column is needed tomorrow, you can simply just add it and store the data. No changes are needed to schema.

This is the complete opposite of relational databases, where the schema defines what must be stored in a table and you cannot simply store more (or less, depending on configuration) data because you've decided a new column is needed. Before the new column can be used, the schema must be updated, which can affect the existing data inside the table. This can be troublesome if you have 50,000,000 records inside the table.

With NoSQL like databases there is no schema, so you can freely store data of any arbitrary shape and size into the engine and it'll work with it.

Scalability#

No primary benefit of NoSQL databases is their ability to scale to endless degrees. Relational databases don't scale easily, but it is possible. With NoSQL like databases, the idea is that they can and do scale easily, which of course comes with disadvantages.

The benefits are clear: as your data increases in size and complexity, the NoSQL database engine can just scale to accommodate it.

The disadvantages include the complexity required to maintain such a system and its ability to "endlessly" scale with the data.

Next#

Now let's explore the SQL language and learn some basics. We can then look at creating our own database.