There are times when it is necessary to join a table to itself. Let’s take a look at the Employee table in the AdventureWorks database to understand the need for a self-join and then how to put one together. We’ll see how to put together a simple self-join and then how to join more tables to the self-join.
This blog post makes reference to material introduced in another one of my posts: Understanding Database Building Blocks in SQL Server.
A Simple Self-Join
In the partial result shown below, notice that the IDs in the ManagerID column reference the IDs in the EmployeeID column. To see this, take a look at the first row. You can see that the manager in that row is the employee with EmployeeID = 16. Now, try to find employee 16 in the EmployeeID column. It is the next-to-last row in the figure below. That row is an employee who is also a manager. In fact, the title of the employee in the first row is Production Technician and employee ID 16’s title is Production Supervisor. ManagerIDs reference values in the EmployeeID column.
Let’s suppose we want to list Managers matched up with their employees, along with attributes like names, hire dates, titles, and others. How would we write such a query? Any query in which we want to match up employees and their managers will require us to join this table to itself! This is not as bad as it sounds as long as we apply the simple trick of pretending that we have two tables. Joins between two tables are easy!
And we don’t have to pretend all that much. The Employee table represents two entity-types, Employee and Manager. (As discussed in my blog post: Understanding Database Building Blocks in SQL Server it is OK to use one table for two entity-types as long as 1) both entity-types share the same attributes, and 2) an entity in one entity-type is a member of the other entity-type. Both of these conditions are met in the Employee table, because managers are also employees and they have the same attributes.) One way to see this is to note that the EmployeeID is also the ManagerID. Every row represents a unique employee and, if that employee is referenced in the ManagerID column of another row, then that employee is also a manager.
We can verify the existence of two entity-types in the Employee table in SQL Server Management Studio Object Explorer. If we expand the Keys folder in the Employee table, we see that there is a foreign key that describes the relationship between managers and employees.
And double-clicking on that foreign key allows us to see in its properties that the relationship is very much as it would be between two tables; the only difference is that the same table is used for both the Foreign Key Base Table as the one used for the Primary Key Base table, but the relationship is one that is also defined with a primary key/foreign key pair, just like with two distinct tables.
Except for the fact that we are dealing with only one table, the self-join will be otherwise very much like a regular join between two tables.
(By the way, whoever designed the AdventureWorks database could have created separate tables for each entity-type, but that would have caused data duplication. Because a manager entity is also an employee entity, that person would appear twice in the database: once in the Employee table and then again in the Manager table. For this reason and others, it is best to represent both entity-types in the same table.)
So, before we jump into the SQL, let’s visualize what we have up to this point. We have two entity-types, Employee and Manager, with a One-to-Many relationship between them (for each employee there can be only one manager, and for each manager there can be many employees). Even though it helps to visualize this by using the two entity-types, we need to keep in mind the reality that each one of these entity-types is represented in the same table, and in that table the primary key is EmployeeID. So the primary key of the Manager entity-type is also the EmployeeID. Remember, as we saw earlier, the EmployeeID represents both employees and managers.
When we join two tables that have a One-to-Many relationship between them, we always join on the primary key of the table on the one side of the relationship and the foreign key in the table on the many side of the relationship. The entity-type on the one side is Manager, and so the Manager’s primary key (EmployeeID) will be used. The entity-type on the many side is Employee, and so the Employee’s foreign key (ManagerID) will be used. So the join condition is:
Manger.EmployeeID = Employee.ManagerID
We’re ready to put together a simple self-join query in SQL that matches up employees and their managers:
SELECT * FROM HumanResources.Employee Employee JOIN HumanResources.Employee Manager ON Manager.EmployeeID = Employee.ManagerID
Note that we join the Employee table to itself, but we do so by assigning two different roles to the same table, represented by the table aliases, Employee and Manager. Once we do that, we can play a game of make-believe in which we pretend that there are actually two tables. Here’s the query above with a column list:
Our simple self-join is working!
Including Data from Other Tables
What if we want to include data from other tables? To which of the two “tables” in the self-join do we join additional tables? After all, they are both the same table! It’s easy to get confused by this, but the solution is actually very simple.
Let’s say we want to show employee’s names, too. Names are in the Contact table, so we will need to join that, too. And, because we want employee names and not manager names right now, we will join it to the Employee table:
And what about the managers? What if we want to display their names, too? We have to do the same thing and join the Contact table again—this time joined to the Manager “table.”
Summary
This post has shown that the key to understanding self-joins is to realize that the join is actually between two entity-types, both of which are represented in the same table. You express this in SQL by joining the same table to itself with different aliases. From that point on, the query works as if two tables are being joined. You can display columns from either “table,” you can join other tables to either “table,” and so on.
Have fun with self-joins!
Peter Avila
SQL Server Instructor – Interface Technical Training
Phoenix, AZ