Home > Classroom Content > A Real-World Example of a Non-Correlated SELECT Subquery and Cross Join
A Real-World Example of a Non-Correlated SELECT Subquery and Cross Join
Instructor: Peter Avila
Technology: SQL Server

I recently developed a solution to a problem that is a good example of a non-correlated subquery in a SELECT clause that can also be written as a cross join. I’ll describe the requirement, the data, and then the solution with a non-correlated subquery and another with a cross join. To keep my client and their data anonymous, I’ve translated the problem to the AdventureWorks database.

The Requirement

The client wanted to see, for each sales person who placed orders over a time period, their total sales and their “productivity” over that time period. They defined productivity as the total sales divided by the number of business days in that time period. Here is the output that meets the requirement and that our solution will have to produce. This is for the time period of March, 2004, given the data in the AdventureWorks database.

001-non-Correlated-SELECT-Subquery-and-Cross-Join

The Data

The AdventureWorks database contains a table called Sales.SalesOrderHeader that holds sales orders. There are three columns in that table that are of interest to us: SalesPersonID, OrderDate, and TotalDue. Here is a sample of the data in those columns, formatted for a clean presentation, and including only the data for the time period of March, 2004 that I will use to test the solution, though not all rows in that time period are shown:

USE AdventureWorks

SELECT SalesPersonID
, CONVERT(VARCHAR, OrderDate, 101) AS OrderDate
, '$' + CONVERT(VARCHAR, TotalDue, 1) AS TotalDue
FROM Sales.SalesOrderHeader

002-non-Correlated-SELECT-Subquery-and-Cross-Join

In addition, I have created a table called DimDate that is similar to the one my client has in their database. DimDate is a dimension table in a data warehouse. (As we see in the Data Warehouses course, it is useful to store dates in a data warehouse in a date dimension table in which each row represents a different date). The IsBusinessDay column is a bit data type in which a 1 means that the date is a business day and a zero means that it is not. To simplify things for this example, I have created dates only for the date range I will be using in this example—March, 2004—and flagged only weekend dates as non-business days. Here is the script I used to create the DimDate table and to show its contents:

-- Create the DimDate table

USE AdventureWorks

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND  TABLE_NAME = 'DimDate')
	DROP TABLE dbo.DimDate

CREATE TABLE dbo.DimDate
(
  OrderDate DATE NOT NULL
, IsBusinessDay BIT NOT NULL
)

--Populate the DimDate table

DECLARE @StartDate DATE = '3/1/2004', @EndDate DATE = '3/31/2004'
DECLARE @CurrDate DATE = @StartDate

WHILE @CurrDate <= @EndDate
   BEGIN
      INSERT INTO dbo.DimDate
	     VALUES (@CurrDate, CASE WHEN DATEPART(WEEKDAY, @CurrDate) NOT IN (1,7) THEN 1 ELSE 0 END)
	  SET @CurrDate = DATEADD(DAY, 1, @CurrDate)
   END 

SELECT *
FROM dbo.DimDate

003-non-Correlated-SELECT-Subquery-and-Cross-Join

The Solution

Step 1: Create the Base Query that Aggregates by the Salespersonid

The output you saw earlier that meets the requirement shows one row for every sales person, along with a sum of the TotalDue column for each sales person and the number of business days. Even though the number of business days will be the same for each salesperson, it needs to be included on every row so that we can divide each salesperson’s sum of total due by the number of business days.

We’ll start with the following query that finds the sum of the total due for each employee for the test time period of March, 2004. Notice that we group by the SalesPersonID to get one row per sales person and then we sum the TotalDue for each sales person. Notice also that in this phase we are not yet doing any formatting of the output, because we need the TotalDue to remain a numeric value for when we divide it by the number of business days in a subsequent phase.

SELECT SalesPersonID
, SUM(TotalDue) AS TotalDue
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
GROUP BY SalesPersonID

004-non-Correlated-SELECT-Subquery-and-Cross-Join

By the way, notice that there is a NULL SalesPersonID. A SalesPersonID is NULL when the order is taken  electronically over the company’s website. Let’s clean that up a bit. We’ll use the ISNULL() function to display “Internet Sale” when the SalesPersonID is NULL. There’s just one small problem with that approach, though. The ISNULL() function requires that both of its arguments be the same data type, and our two arguments are not; SalesPersonID is an integer and “Internet Sale” is character. If we do nothing and leave it up to SQL Server, it will attempt to perform an implicit conversion. In an implicit conversion, the lower data type (character, in this case) is converted up to the higher data type (integer, in this case), and that would fail because “Internet Sale” does not resemble an integer! So, we will need to perform an explicit conversion and cast the SalesPersonID down to a character. Here is the revised query:

SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sale') AS SalesPersonID
, SUM(TotalDue) AS TotalDue
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
GROUP BY SalesPersonID

005-non-Correlated-SELECT-Subquery-and-Cross-Join

Step 2: Include the Number of Business Days

We can very easily find the number of business days for a time period by filtering the DimDate table to return only days where IsBusinessDay = 1 within the time period we want. Here is a query that does just that for our time period of March, 2004 (there were 23 business days in March, 2004):

SELECT COUNT(*) NumBusDays
FROM dbo.DimDate
WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
  AND IsBusinessDay = 1

006-non-Correlated-SELECT-Subquery-and-Cross-Join

Notice that this query returns a single row and a single column. A query that returns a single value like this is referred to as a scalar query. Because the query above is scalar, it can be used as a column in a SELECT clause. Just one problem: The GROUP BY clause in our base table won’t allow us to do that, because the GROUP BY demands that the only two things in the SELECT clause be what is in the GROUP BY clause and aggregate functions. We can get around this in a couple of ways.

One way is by making our base query a derived table subquery and including the scalar query in the outer query.

In the query below, BaseQuery is the base query we developed earlier, and BusDays is the scalar query that returns the number of business days in our time period. NumBusDays is a subquery in the SELECT clause; it occupies a column position in the outer SELECT. The outer query displays all the columns in the base query (BaseQuery.*) and the row count returned by NumBusDays.

SELECT BaseQuery.*, (SELECT COUNT(*) As TotBusDays
                     FROM dbo.DimDate
                     WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
                       AND IsBusinessDay = 1
                    ) AS NumBusDays
FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID
      , SUM(TotalDue) AS TotalDue
      FROM Sales.SalesOrderHeader
      WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
      GROUP BY SalesPersonID
     ) AS BaseQuery

007-non-Correlated-SELECT-Subquery-and-Cross-Join

The other way to accomplish the same results is by using a cross join. A cross join joins two tables by matching up every row in one table with every row in the other table. Since we want the single row returned by our scalar query to appear on every row of our aggregate query, a cross join would also work (any query that uses a non-correlated subquery in a Select clause can also be written as a cross join).

SELECT BaseQuery.*, TotBusDays.NumBusDays.
FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID
      , SUM(TotalDue) AS TotalDue
      FROM Sales.SalesOrderHeader
      WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
      GROUP BY SalesPersonID
     ) AS BaseQuery
     CROSS JOIN 
     (SELECT COUNT(*) As NumBusDays
      FROM dbo.DimDate
      WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
        AND IsBusinessDay = 1
     ) AS BusDays

008-non-Correlated-SELECT-Subquery-and-Cross-Join

You can run both queries with the actual execution plan on to see if one is more efficient than the other. In this example, they are both the same—neither approach is faster or more efficient than the other, though you should not always count on this being the case; it’s always a good idea to compare both execution plans as there are many factors that can effect the outcome.

Step 3: Calculate the Productivity and Format the Output

At this point, we have everything we need on one row to allow us to divide the sum of the TotalDue column by the number of business days. All we have to do now is add another column to the outer query that performs that division; but in the subquery approach, if we do it that way, we’d have to repeat the scalar query as the denominator. To avoid that, we can make a derived table out of everything we have so far and then use the column aliases, instead. We can also format things nicely in the new outer query. Much easier!

Compare the results of this query with the results shown in the Requirements section above; they are the same.

SELECT SalesPersonID
, '$' + CONVERT(VARCHAR, TotalDue, 1) AS SumTotalDue
, NumBusDays
, '$' + CONVERT(VARCHAR, TotalDue/NumBusDays, 1) AS Productivity
FROM (SELECT BaseQuery.*, (SELECT COUNT(*) As TotBusDays
                           FROM dbo.DimDate
                           WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
                             AND IsBusinessDay = 1
                          ) AS NumBusDays
      FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID
            , SUM(TotalDue) AS TotalDue
            FROM Sales.SalesOrderHeader
            WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
            GROUP BY SalesPersonID
           ) AS BaseQuery
     ) EverythingWeNeed
ORDER BY Productivity DESC

009-non-Correlated-SELECT-Subquery-and-Cross-Join

With the cross join, the new outer query is not even necessary:

SELECT BaseQuery.SalesPersonID
, '$' + CONVERT(VARCHAR, BaseQuery.TotalDue, 1) As SumTotalDue
, BusDays.NumBusDays
, '$' + CONVERT(VARCHAR, BaseQuery.TotalDue/BusDays.NumBusDays, 1) AS Productivity

FROM (SELECT ISNULL(CAST(SalesPersonID AS VARCHAR), 'Internet Sales') AS SalesPersonID
      , SUM(TotalDue) AS TotalDue
	  FROM Sales.SalesOrderHeader
	  WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
	  GROUP BY SalesPersonID
	 ) AS BaseQuery
	 CROSS JOIN 
	 (SELECT COUNT(*) As NumBusDays
	  FROM dbo.DimDate
	  WHERE OrderDate BETWEEN '3/1/2004' AND '3/31/2004'
	 	AND IsBusinessDay = 1
	 ) AS BusDays
ORDER BY BaseQuery.TotalDue/BusDays.NumBusDays DESC

010-non-Correlated-SELECT-Subquery-and-Cross-Join

Enjoy!

Peter Avila
SQL Server Instructor – Interface Technical Training
Phoenix, AZ