I recently developed a solution to a problem that is a good example of a non-correlated subquery in a SELECT clause that can also be written as a cross join. I’ll describe the requirement, the data, and then the solution with a non-correlated subquery and another with a cross join. To keep my client and their data anonymous, I’ve translated the problem to the AdventureWorks database.
The Requirement
The client wanted to see, for each sales person who placed orders over a time period, their total sales and their “productivity” over that time period. They defined productivity as the total sales divided by the number of business days in that time period. Here is the output that meets the requirement and that our solution will have to produce. This is for the time period of March, 2004, given the data in the AdventureWorks database.
The Data
The AdventureWorks database contains a table called Sales.SalesOrderHeader that holds sales orders. There are three columns in that table that are of interest to us: SalesPersonID, OrderDate, and TotalDue. Here is a sample of the data in those columns, formatted for a clean presentation, and including only the data for the time period of March, 2004 that I will use to test the solution, though not all rows in that time period are shown:
USE AdventureWorks SELECT SalesPersonID , CONVERT(VARCHAR, OrderDate, 101) AS OrderDate , '$' + CONVERT(VARCHAR, TotalDue, 1) AS TotalDue FROM Sales.SalesOrderHeader
In addition, I have created a table called DimDate that is similar to the one my client has in their database. DimDate is a dimension table in a data warehouse. (As we see in the Data Warehouses course, it is useful to store dates in a data warehouse in a date dimension table in which each row represents a different date). The IsBusinessDay column is a bit data type in which a 1 means that the date is a business day and a zero means that it is not. To simplify things for this example, I have created dates only for the date range I will be using in this example—March, 2004—and flagged only weekend dates as non-business days. Here is the script I used to create the DimDate table and to show its contents:
-- Create the DimDate table USE AdventureWorks IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = 'DimDate') DROP TABLE dbo.DimDate CREATE TABLE dbo.DimDate ( OrderDate DATE NOT NULL , IsBusinessDay BIT NOT NULL ) --Populate the DimDate table DECLARE @StartDate DATE = '3/1/2004', @EndDate DATE = '3/31/2004' DECLARE @CurrDate DATE = @StartDate WHILE @CurrDate <= @EndDate BEGIN INSERT INTO dbo.DimDate VALUES (@CurrDate, CASE WHEN DATEPART(WEEKDAY, @CurrDate) NOT IN (1,7) THEN 1 ELSE 0 END) SET @CurrDate = DATEADD(DAY, 1, @CurrDate) END SELECT * FROM dbo.DimDate
The Solution
Step 1: Create the Base Query that Aggregates by the Salespersonid
The output you saw earlier that meets the requirement shows one row for every sales person, along with a sum of the TotalDue column for each sales person and the number of business days. Even though the number of business days will be the same for each salesperson, it needs to be included on every row so that we can divide each salesperson’s sum of total due by the number of business days.
We’ll start with the following query that finds the sum of the total due for each employee for the test time period of March, 2004. Notice that we group by the SalesPersonID to get one row per sales person and then we sum the TotalDue for each sales person. Notice also that in this phase we are not yet doing any formatting of the output, because we need the TotalDue to remain a numeric value for when we divide it by the number of business days in a subsequent phase.
SELECT SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID
By the way, notice that there is a NULL SalesPersonID. A SalesPersonID is NULL when the order is taken electronically over the company’s website. Let’s clean that up a bit. We’ll use the ISNULL() function to display “Internet Sale” when the SalesPersonID is NULL. There’s just one small problem with that approach, though. The ISNULL() function requires that both of its arguments be the same data type, and our two arguments are not; SalesPersonID is an integer and “Internet Sale” is character. If we do nothing and leave it up to SQL Server, it will attempt to perform an implicit conversion. In an implicit conversion, the lower data type (character, in this case) is converted up to the higher data type (integer, in this case), and that would fail because “Internet Sale” does not resemble an integer! So, we will need to perform an explicit conversion and cast the SalesPersonID down to a character. Here is the revised query:
SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sale') AS SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID
Step 2: Include the Number of Business Days
We can very easily find the number of business days for a time period by filtering the DimDate table to return only days where IsBusinessDay = 1 within the time period we want. Here is a query that does just that for our time period of March, 2004 (there were 23 business days in March, 2004):
SELECT COUNT(*) NumBusDays FROM dbo.DimDate WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' AND IsBusinessDay = 1
Notice that this query returns a single row and a single column. A query that returns a single value like this is referred to as a scalar query. Because the query above is scalar, it can be used as a column in a SELECT clause. Just one problem: The GROUP BY clause in our base table won’t allow us to do that, because the GROUP BY demands that the only two things in the SELECT clause be what is in the GROUP BY clause and aggregate functions. We can get around this in a couple of ways.
One way is by making our base query a derived table subquery and including the scalar query in the outer query.
In the query below, BaseQuery is the base query we developed earlier, and BusDays is the scalar query that returns the number of business days in our time period. NumBusDays is a subquery in the SELECT clause; it occupies a column position in the outer SELECT. The outer query displays all the columns in the base query (BaseQuery.*) and the row count returned by NumBusDays.
SELECT BaseQuery.*, (SELECT COUNT(*) As TotBusDays FROM dbo.DimDate WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' AND IsBusinessDay = 1 ) AS NumBusDays FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID ) AS BaseQuery
The other way to accomplish the same results is by using a cross join. A cross join joins two tables by matching up every row in one table with every row in the other table. Since we want the single row returned by our scalar query to appear on every row of our aggregate query, a cross join would also work (any query that uses a non-correlated subquery in a Select clause can also be written as a cross join).
SELECT BaseQuery.*, TotBusDays.NumBusDays. FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID ) AS BaseQuery CROSS JOIN (SELECT COUNT(*) As NumBusDays FROM dbo.DimDate WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' AND IsBusinessDay = 1 ) AS BusDays
You can run both queries with the actual execution plan on to see if one is more efficient than the other. In this example, they are both the same—neither approach is faster or more efficient than the other, though you should not always count on this being the case; it’s always a good idea to compare both execution plans as there are many factors that can effect the outcome.
Step 3: Calculate the Productivity and Format the Output
At this point, we have everything we need on one row to allow us to divide the sum of the TotalDue column by the number of business days. All we have to do now is add another column to the outer query that performs that division; but in the subquery approach, if we do it that way, we’d have to repeat the scalar query as the denominator. To avoid that, we can make a derived table out of everything we have so far and then use the column aliases, instead. We can also format things nicely in the new outer query. Much easier!
Compare the results of this query with the results shown in the Requirements section above; they are the same.
SELECT SalesPersonID , '$' + CONVERT(VARCHAR, TotalDue, 1) AS SumTotalDue , NumBusDays , '$' + CONVERT(VARCHAR, TotalDue/NumBusDays, 1) AS Productivity FROM (SELECT BaseQuery.*, (SELECT COUNT(*) As TotBusDays FROM dbo.DimDate WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' AND IsBusinessDay = 1 ) AS NumBusDays FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID ) AS BaseQuery ) EverythingWeNeed ORDER BY Productivity DESC
With the cross join, the new outer query is not even necessary:
SELECT BaseQuery.SalesPersonID , '$' + CONVERT(VARCHAR, BaseQuery.TotalDue, 1) As SumTotalDue , BusDays.NumBusDays , '$' + CONVERT(VARCHAR, BaseQuery.TotalDue/BusDays.NumBusDays, 1) AS Productivity FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID , SUM(TotalDue) AS TotalDue FROM Sales.SalesOrderHeader WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' GROUP BY SalesPersonID ) AS BaseQuery CROSS JOIN (SELECT COUNT(*) As NumBusDays FROM dbo.DimDate WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004' AND IsBusinessDay = 1 ) AS BusDays ORDER BY BaseQuery.TotalDue/BusDays.NumBusDays DESC
Enjoy!
Peter Avila
SQL Server Instructor – Interface Technical Training
Phoenix, AZ