Author’s Note: I loooooooove databases and I want to spread the knowledge around! This is the first in what will hopefully be a series on databases, with a focus on providing foundational information and context necessary to understand and perform attacks against databases… as well as defend against them!

Shoutout to Dr. Poole, whose amazing database class got me hooked forever! <3

Table of Contents:

Terminology

“Words, words words.” -Hamlet, Act 2 Scene 2

  • Column: Up and Down
  • Row / Record: Left and Right
  • Table: The whole shebang
  • Field: a single piece of information from a record or row. (I think of this as the intersection of a column and a row)
  • Primary key (PK): a value that uniquely identifies a row or record. Primary keys should be unique, non-null, and present in every row.
  • Composite primary key: a primary key that constructed from the values of more than one field.
  • Relationship: The association between tables.
  • Foreign key (FK): the primary key from table A which has been inserted into table B to create a relationship between table A and table B (This will make more sense after the schema section, I promise!)
  • Referential integrity: refers to the accuracy and consistency of data within a relationship. 1
  • Constraint: A rule that enforces referential integrity.
  • Structured Query Language (SQL): a standard for managing databases and retrieving data.
  • Schema: the skeleton structure that represents the logical view of the entire database. 2
  • Entity Relationship Diagram (ERD): A diagram of the database schema that pictorally represents tables, relationships, and typically includes primary keys.
  • Cardinality: the maximum number of times an instance in one entity can relate to instances of another entity 3
  • Ordinality: the minimum number of times an instance in one entity can be associated with an instance in the related entity 3
  • Database Management System (DBMS): Software designed to manage a database. SQL syntax and features can vary slightly depending on which DBMS is used.

Schema Stuff

The table is the foundation of the relational database. Tables store all of the data in the database, and table structure dictates what relationships need to exist between what tables. This organization can be optimized by reducing redundency and improving data integrity using a process called normalization. But normalization and schema design are topics for another day!

Now we know what tables are… But a relational database is all about the relationships between tables and the data they contain! A database schema “defines how the data is organized and how the relations among them are associated”. 2 There are three types of relationships in the typical relational database: one to one (1-1), one to many (1-M), and many to many (M-N).

Types of Relationships

In order to explain how database relationships work, I’m going to use the following scenario: A small university has a database to keep track of their students, professors, and classes. There is a table for students with a primary key of a student ID, first name, last name, major, and GPA:

Note: I use (PK) to denote the primary key of a table and (FK) to denote foreign keys for ease of comprehension. Normally, you will not see these notations in database tables.

Student_ID (PK) First_Name Last_Name Major GPA
1 Peter Parker Biochemistry 3.6
2 Gwen Stacy Biochemistry 3.6
3 Bridget O’Neal Archaeology 3.4

The table for professors is similar:

Faculty_ID (PK) First_Name Last_Name Department
1 Marina Caches General Studies
2 Rodney Ballinge Chemistry
3 Isaac Slattery Biochemistry
4 Henry Croft Archaeology

The table for classes:

Class_ID (PK) Subject Course_Num Capacity
1 CHEM 101 200
2 PHYS 203 200
3 BIOL 316 25

One to One (1-1)

Let’s assume that some upper classmen are assigned to professors for one on one mentoring, and each professor only mentors one student. This calls for a one to one relationship - one row in one table (students) is directly related to one and only one row from another table (faculty). When creating a one to one relationship, the primary key from one table should be inserted into the other table as a foreign key. Which one gets inserted where is up to the schema designer!

For example, our student and faculty tables could look like this if we insert the primary key from the faculty table into the student table as the foreign key “Faculty_Mentor_ID”:

Student_ID (PK) First_Name Last_Name Major GPA Faculty_Mentor_ID (FK)
1 Peter Parker Biochemistry 3.6 2
2 Gwen Stacy Biochemistry 3.6 3
3 Bridget O’Neal Archaeology 3.4 4

The table for professors is unchanged:

Faculty_ID (PK) First_Name Last_Name Department
1 Marina Caches General Studies
2 Rodney Ballinger Chemistry
3 Isaac Slattery Biochemistry
4 Henry Croft Archaeology

If we insert the primary key from the students table into the faculty table as the foreign key “Mentored_Student_ID”, the student table is unchanged:

Student_ID (PK) First_Name Last_Name Major GPA
1 Peter Parker Biochemistry 3.6
2 Gwen Stacy Biochemistry 3.6
3 Bridget O’Neal Archaeology 3.4

And the faculty table has an additional field:

Faculty_ID (PK) First_Name Last_Name Department Mentored_Student_ID (FK)
1 Marina Caches General Studies NULL
2 Rodney Ballinger Chemistry 1
3 Isaac Slattery Biochemistry 2
4 Henry Croft Archaeology 3

One To Many (1-M)

An example of a one to many relationship is a professor who teaches multiple courses. Unlike a one to one relationship, the primary key from the table with the “one” must be inserted in to the table with the “many”. In our case, that means that the primary key from the professors table must be inserted into the classes table as the foreign key “Faculty_ID”.

The new classes table would look like this:

Class_ID (PK) Subject Course_Num Capacity Faculty_ID (FK)
1 CHEM 101 200 2
2 PHYS 203 200 2
3 BIOL 316 25 3

Many to Many (M-N)

An example of a many to many relationship in this scenario is student enrollment. Many (more than one) student is enrolled in each class, and students usually take more than one class. Many to many relationships require the creation of an additional table or “intermediary entity”. Let’s call this additional table “Enrollment”. The enrollment table will contain two fields: the primary key from the students table and the primary key from the classes table. This associates a student with a class!

The enrollment table would look like this:

Student_ID (FK) Class_ID (FK)
1 3
1 2
2 3

What is the primary key of this new table? There doesn’t seem to be one value that is able to uniquely identify the columns! So, in this case we can use a composite primary key! The composite primary key for this table is Student_ID combined with Class_ID. This works because the combination of student and class is guaranteed to be unique - a student can only be enrolled in one class once!

Entity Relationship Diagrams (ERDs)

Now we have a database… But it’s hard to imagine what it looks like and how the relationships connect the tables. I usually draw these by hand or use a tool like ERAssistant, draw.io or visio. Some DBMSs will create ERDs for you based on the schema!

Crow’s foot notation3 is used to depict entities (tables) and the relationships between them. This notation involves both cardinality (maximum) and ordinality (minimum). Some diagrams use only one of the other, but it’s better to use both.

Crow's Foot Notation key

Here’s the ERD for the example tables I’ve been using:

Entity Relationship Diagram for the example tables

For larger databases ERDs can look really messy but they’re still helpful! Here’s a really hairy ERD I created for a project in my database class:

Entity Relationship Diagram for the example tables

Basic SQL

Now that we’ve got some sort of understanding of how relational databases work, let’s explore the basics of SQL. We’ll be working off of the same scenario that was used in the Types of Relationships section. The goal of this section is not to be comprehensive, but to give enough informaiton that you can understand how SQL injection works. If you want to learn more about SQL, check out Additional Resources!

Here are the tables we’ll be writing queries for:

Students:

Student_ID (PK) First_Name Last_Name Major GPA Faculty_Mentor_ID (FK)
1 Peter Parker Biochemistry 3.6 2
2 Gwen Stacy Biochemistry 3.6 3
3 Bridget O’Neal Archaeology 3.4 4

Professors:

Faculty_ID (PK) First_Name Last_Name Department
1 Marina Caches General Studies
2 Rodney Ballinger Chemistry
3 Isaac Slattery Biochemistry
4 Henry Croft Archaeology

Classes:

Class_ID (PK) Subject Course_Num Capacity Faculty_ID (FK)
1 CHEM 101 200 2
2 PHYS 203 200 2
3 BIOL 316 25 3

Enrollment:

Student_ID (FK) Class_ID (FK)
1 3
1 2
2 3

SELECT, WHERE, LIKE, AND, OR, NOT

Select statements are used to select information from the database to be displayed as the result of the query:

SELECT [field1, field2] FROM [table];

Field1 and field2 should be replaced with the fields we want to return from the table:

SELECT First_Name, Last_Name, Major, GPA FROM Students;

The result of the query should be:

First_Name Last_Name Major GPA
Peter Parker Biochemistry 3.6
Gwen Stacy Biochemistry 3.6
Bridget O’Neal Archaeology 3.4

Substituting an asterisk for the list of fields returns all fields from the table:

SELECT * FROM Students;
Student_ID (PK) First_Name Last_Name Major GPA Faculty_Mentor_ID (FK)
1 Peter Parker Biochemistry 3.6 2
2 Gwen Stacy Biochemistry 3.6 3
3 Bridget O’Neal Archaeology 3.4 4

The WHERE clause is used to modify the results of a select statement to match a set of criteria. For example, we want to return the names of students whose GPA is greater than 3.5:

SELECT First_Name, Last_Name, GPA FROM Students
WHERE GPA > 3.5;
First_Name Last_Name GPA
Peter Parker 3.6
Gwen Stacy 3.6

The LIKE keyword is used with the WHERE statement to match patterns within fields. The % character is a wildcard (matches anything) in SQL. If we knew the first letter of a student’s name, we could use a pattern match to find their first name:

SELECT First_Name, Last_Name FROM Students
WHERE First_Name like 'G%';
First_Name Last_Name
Gwen Stacy

The AND, OR, NOT keywords are used to add additional conditions to WHERE clauses:

SELECT First_Name, Last_Name FROM Students
WHERE Major like 'Bio%' AND GPA > 3.5;
First_Name Last_Name
Peter Parker
Gwen Stacy
SELECT First_Name, Last_Name FROM Students
WHERE Major like 'Bio%' OR GPA > 3.0;
First_Name Last_Name
Peter Parker
Gwen Stacy
Bridget O’Neal
SELECT First_Name, Last_Name FROM Students
WHERE NOT Major like 'Bio%';
First_Name Last_Name
Bridget O’Neal

Joins and Union

The join and union operators are used to select related data from multiple tables at a time. There are several types of join operators, but the most common are Inner Join, Left Join, an Full Join. I’ll give a brief description of how these work but for in-depth examples and details on the subtle differences between inner and outer joins try this resource. The union operator is very similar to the mathematical set theory union operator - it combines the result sets from two queries into one big result set. Nested queries are a third method for extracting data from multiple tables at a time.

Inner Join

The INNER JOIN keyword selects records that have matching values in both tables. 4 Essentially, this means that the query won’t return any information from one table that does not match the other table. So, any records that don’t have any matching records in the other table will be left out of the results!

The following query returns a list of students who have faculty mentors, and the information associated with the faculty mentors:

SELECT * FROM Students
INNER JOIN Professors ON Students.Faculty_Mentor_ID = Professors.Faculty_ID;
Student_ID (PK) First_Name Last_Name Major GPA Faculty_Mentor_ID (FK) Faculty_ID (PK) First_Name Last_Name Department
1 Peter Parker Biochemistry 3.6 2 2 Rodney Ballinger Chemistry
2 Gwen Stacy Biochemistry 3.6 3 3 Isaac Slattery Biochemistry
3 Bridget O’Neal Archaeology 3.4 4 4 Henry Croft Archaeology

Notice that Marina Caches is left out of the result set because she is not mentoring a student.

The primary and foreign keys are being used to link the information in the two tables together.


Left Join

The LEFT JOIN keyword selects all records from the “left” table and joins matching information from the “right” table. Left and right refer to which table is on which side of the equals sign in the join statement. For consistency sake, I put the table mentioned in the FROM clause on the left.

The followiing query returns a list of all faculty and the students they mentor:

SELECT * FROM Professors
LEFT JOIN Students ON Professors.Faculty_ID Student.Faculty_Mentor_ID;
Faculty_ID (PK) First_Name Last_Name Department Student_ID (PK) First_Name Last_Name Major GPA Faculty_Mentor_ID (FK)
1 Marina Caches General Studies NULL NULL NULL NULL NULL NULL
2 Rodney Ballinger Chemistry 1 Peter Parker Biochemistry 3.6 2
3 Isaac Slattery Biochemistry 2 Gwen Stacy Biochemistry 3.6 3
4 Henry Croft Archaeology 3 Bridget O’Neal Archaeology 3.4 4

Note: A right join can achieve the same results as a left join, if the order of the tables being joined is swapped.


Full Join (Full Outer Join)

The FULL JOIN and a FULL OUTER JOIN keywords preform the same action. FULL JOIN is the opposite of an inner join - it returns all rows where there is a match in either the left or right table. Full Joins can return very large result sets.

For more information on full joins and full outer joins: https://www.w3schools.com/sql/sql_join_full.asp


Union

The union operator combines the results from two separate querys. The union operator is different from the join operators because it joins results regardless of the contents of the result sets. The only requirement for a union operator is that the two result sets must have the same number of fields.

The following query returns the first names of all students and faculty:

SELECT First_Name from Students
UNION
SELECT First_Name from Professors;
First_Name
Peter
Gwen
Bridget
Marina
Rodney
Isaac
Henry

If the number of fields returned from each of the queries is not equal, you will get a SQL syntax error and nothing will be returned.


SQL Injection

Vulnerable Code from DVWA

Now that we know enogh about SQL to understand how SQL Injection works, let’s talk a little bit about PHP, which is a very common method of connecting web applications to databases. The code I’m showing in this section is from DVWA. DVWA and Mutillidae 5 are both great places to start learning web attacks!

Here’s some of the vulnerable code:

<?php

if( isset( $_REQUEST[ 'Submit' ] ) ) {
	// Get input
	$id = $_REQUEST[ 'id' ];

	// Check database
	$query  = "SELECT first_name, last_name FROM users WHERE user_id = '$id';";
	$result = mysqli_query($GLOBALS["___mysqli_ston"],  $query ) or die( '<pre>' . ((is_object($GLOBALS["___mysqli_ston"])) ? mysqli_error($GLOBALS["___mysqli_ston"]) : (($___mysqli_res = mysqli_connect_error()) ? $___mysqli_res : false)) . '</pre>' );

	// Get results
	while( $row = mysqli_fetch_assoc( $result ) ) {
		// Get values
		$first = $row["first_name"];
		$last  = $row["last_name"];

		// Feedback for end user
		$html .= "<pre>ID: {$id}<br />First name: {$first}<br />Surname: {$last}</pre>";
	}

	mysqli_close($GLOBALS["___mysqli_ston"]);
}

?>

The part we’re interested in is the query string:

$query  = "SELECT first_name, last_name FROM users WHERE user_id = '$id';";

Because our user input ends up in the $id variable without any sanitization or filtering, the application is vulnerable to SQL injection!

Always sanitize your input!

The Anatomy of a UNION based SQL Injection

We’ve found some vulerable code… What’s next? Writing some malicious user input!

So far, our SQL query looks like this:

SELECT first_name, last_name FROM users WHERE user_id = '[OUR INPUT]';

We want to complete this query with valid syntax and extract some information from the database that we aren’t supposed to be able to see. The first step is to complete the query that is already started in order to avoid syntax errors. So let’s just make something up for the user ID:

SELECT first_name, last_name FROM users WHERE user_id = '3';

Now we have a query that doesn’t have any syntax errors and will run… But we aren’t getting any useful information back! Now we add part two - our malicious query! Remember when I mentioned that the UNION operator allows you to combine the results from two separate queries? We can use it to inject our own query and return results! Now our query looks like this:

SELECT first_name, last_name FROM users WHERE user_id = '3' UNION SELECT password FROM USERS';

But that won’t work! There are two syntax errors with that query!

  1. The query doesn’t have the same number of fields on each side of our UNION operator.
  2. There is a dangling (unmatched) single quote at the end.

Fortunately, these issues have pretty simple fixes:

SELECT first_name, last_name FROM users WHERE user_id = '3' UNION SELECT password, 1 FROM USERS -- ';

To solve the first problem, we can add an additional field by selecting a constant integer! The fields match up! To get rid of the syntax error caused by the unmatched quote, we can just comment it out! You might have to fiddle around a bit with this part - depending on the DBMS being used, the comment character might be different!

The malicious input we have created is:

3' UNION SELECT password, 1 FROM USERS --

BUT WAIT THERE’S MORE!! We can make our query just a little bit better!

If you look carefully at the query, we can still get results from the part of the query from the PHP file. To filter out these results, all we have to do is make the first part of the query always false using the AND operator:

SELECT first_name, last_name FROM users WHERE user_id = '3' AND 1=2 UNION SELECT password, 1 FROM USERS -- ';

One never equals two so now we’ll never get any results from the first part of the query!

Additional Resources

If you want to learn more about SQL, check out this fantastic resource by w3schools: https://www.w3schools.com/sql/sql_intro.asp

More about Database Management Systems and Schemas (Also has a section about normalization): https://www.tutorialspoint.com/dbms/dbms_data_schemas.htm

The Metasploitable2 virtual machine comes with DVWA and Mutillidae installed: https://docs.rapid7.com/metasploit/metasploitable-2-exploitability-guide/#mutillidae

sqlmap is an awesome tool that automates discovery and exploitation of SQL injection vulnerabilities! It’s really cool… http://sqlmap.org/

Sources


  1. https://database.guide/what-is-referential-integrity/ ↩︎

  2. https://www.tutorialspoint.com/dbms/dbms_data_schemas.htm ↩︎

  3. https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning#section_2 ↩︎

  4. https://www.w3schools.com/sql/sql_join_inner.asp ↩︎

  5. Mutillidae vulnerable web application: https://github.com/webpwnized/mutillidae DVWA: https://github.com/ethicalhack3r/DVWA ↩︎