New T-SQL Capabilities in SQL Server 2005 - Part 3
Posted by Kevin S. Goff on 05/1/07 | SQL Server
In last month's article, I covered the new APPLY operator, Ranking functions, and the new TRY...CATCH capabilties in T-SQL 2005. This month I'll talk about two of the largest enhancements in the SQL 2005 language - Common Table Expressions (CTEs) and Recursive Queries.
Both of these are part of the SQL-99 standard. CTEs allow developers to more expressively identify a temporary result set (similar to a derived table or temporary view). Recursive Queries allow developers to more easily query hierarchies of data, by permitting developers (in a single SQL statement) to recursively query into a CTE that's also being populated).
CTEs and Recursive queries can be a little difficult to initially grasp, so I'm going to show several code examples to demonstrate their value.
1 - Common Table Expressions (first example)
Suppose we wanted to find all the orders in the Northwind Orders database, where the Freight was greater than $500, and the Freight exceeded the order amount. Now, this is a pretty basic example, but will show how we can use CTEs.
WITHOrderCTE (OrderID, CustomerID, OrderDate, Amount,Freight) AS
(SELECTOH.OrderID, CustomerID, OrderDate, (Unitprice * Quantity) AS Amount, Freight
FROM Orders OH
JOIN [dbo].[Order Details] OD ON OH.OrderID = OD.OrderID
WHERE Freight > 500)
SELECT * FROM OrderCTE WHERE Freight > Amount ORDER BY Freight DESC
In the query above, we declared a CTE called OrderCTE (which, again, you can think of as a temporary result set) with the following line:
WITH OrderCTE (OrderID, CustomerID, OrderDate, Amount,Freight) AS
The remaining part query inside the CTE will populate the CTE with orders greater than $500. The subsequent statement queries the CTE for orders where the freight exceeds the amount.
SELECT * FROM OrderCTE WHERE Freight > Amount ORDER BY Freight DESC
We certainly could have done this all with one query. But some would argue that the CTE approach is a little more readable. This becomes especially true with more complicated derived table queries.
One thing to note: the CTE has a very short shelf-life. How short?...only for the statement following the creation of the CTE. But if you want to break up complicated derived table statements for readability, CTEs are very helpful.
2 - Common Table Expressions (second example)
Suppose we wanted to run the same query for only those customers outside the United States that have at least 20 Orders. We can build 2 CTEs as follows, one for a list of Customers (CustomerCTE) and the other for Orders (OrderCTE), and then join them at the end:
WITH CustomerCTE (CustomerID, CompanyName,OrderCount) AS
(SELECT CustomerID, CompanyName,
(SELECT COUNT(*) FROM Orders WHERE Orders.CustomerID = Customers.CustomerID) AS OrderCount
FROM Customers WHERE Country <> 'USA'),
OrderCTE (OrderID, CustomerID, OrderDate, Amount,Freight) AS
(SELECT OH.OrderID, CustomerID, OrderDate, (Unitprice * Quantity) AS Amount, Freight
FROM Orders OH
JOIN [dbo].[Order Details] OD ON OH.OrderID = OD.OrderID
WHERE Freight > 500)
SELECT CustomerCTE.CustomerID, CompanyName, OrderID, OrderDate, Amount, Freight, OrderCount
FROM OrderCTE JOIN CustomerCTE ON CustomerCTE.CustomerID = OrderCTE.CustomerID
WHERE Freight > Amount AND OrderCount >= 20 ORDER BY Freight DESC
In the example above, I've created a 2nd CTE to retrieve the customers outside the US. The CTE includes the order count for each customer:
WITH CustomerCTE (CustomerID, CompanyName,OrderCount) AS
(SELECT CustomerID, CompanyName,
(SELECT COUNT(*) FROM Orders WHERE Orders.CustomerID = Customers.CustomerID) AS OrderCount
FROM Customers WHERE Country <> 'USA'),
At the end, we've created two CTEs that we can query for the Orders where the freight exceeds the amount, and the order count is greater than 20.
SELECT CustomerCTE.CustomerID, CompanyName, OrderID, OrderDate, Amount, Freight, OrderCount
FROM OrderCTE JOIN CustomerCTE ON CustomerCTE.CustomerID = OrderCTE.CustomerID
WHERE Freight > Amount AND OrderCount >= 20 ORDER BY Freight DESC
3 - Recursive Queries (first example)
The first two examples used the basic capabilties of Common Table Expressions. Now we'll expand our use of CTEs to cover recursive queries.
Let's start with a simple but meaningful example for recursively querying data: suppose we have a Product table that stores hierarchical data, with ParentID pointers to the parent level for each record:
DECLARE @tProducts TABLE (ID int, Name char(50), ParentID int)
INSERT INTO @tProducts VALUES (1, 'Brand 1', null)
INSERT INTO @tProducts VALUES (2, 'Brand 2', null)
INSERT INTO @tProducts VALUES (3, 'Brand 3', null)
INSERT INTO @tProducts VALUES (6, 'Brand 1, Group 1', 1)
INSERT INTO @tProducts VALUES (7, 'Brand 1, Group 2', 1)
INSERT INTO @tProducts VALUES (8, 'Brand 1, Group 3', 1)
INSERT INTO @tProducts VALUES ( 9, 'Brand 2, Group 1', 2)
INSERT INTO @tProducts VALUES (10, 'Brand 2, Group 2', 2)
INSERT INTO @tProducts VALUES ( 11, 'Brand 3, Group 3', 3)
INSERT INTO @tProducts VALUES (12, 'Brand 1, Group 1, Item 1', 6)
INSERT INTO @tProducts VALUES (13, 'Brand 1, Group 1, Item 2', 6)
INSERT INTO @tProducts VALUES (14, 'Brand 1, Group 1, Item 1, SKU 1', 12)
The table above stores a product hierarchy (Brand, Group, Item, and SKU), with one or more rows at each level, and a ParentID reference to the corresponding parent level.
Now suppose we want to query the table for a particular single record, and also retrieve all the parent records. To do so, we need two queries: a query for a single record, and then a query that retrieves the parent records for the single record, and then the grandparent records for those parent records, etc. The full syntax for such a query in SQL 2005 is as follows:
DECLARE @cSearch char(50)
SET @cSearch = 'Brand 1, Group 1'
; WITH ProductCTE (ID, Name, ParentID) AS
-- Anchor query
(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.parentID )
SELECT * FROM ProductCTE
Let's break this query down - there are essentially three pieces to it:
First, we define the CTE:
; WITH ProductCTE (ID, Name, ParentID) AS
Second, we perform the main (or "anchor") query, to retrieve the specific row (for example, "Brand 1, Group 1")
(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
Finally (and perhaps most important), we add our recursive query to the anchor query (with a UNION ALL statement). The recursive query allows us to query against the ProductCTE while we also query against the ProductCTE and the source table for records where the ProductCTE.ID equals the ParentID column of the source table. This process continues until all searches are exhausted.
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.parentID )
This will give us a result of the following:
ID Name ParentID
6 Brand 1, Group 1 1
12 Brand 1, Group 1, Item 1 6
13 Brand 1, Group 1, Item 2 6
14 Brand 1, Group 1, Item 1, SKU 1 12
Before we move on to the next example - suppose that instead of querying for all the parent records, we wanted to query to retrieve all the child records instead? The answer is simple: we reverse the queries on ID and ParentID, as follows:
SET @cSearch = 'Brand 1, Group 1, Item 2'
; WITH ProductCTE AS
-- Anchor query
(SELECT ID, Name, ParentID -- Retrieve the ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ParentID = Prod.ID ) -- reverse the join keys
SELECT * FROM ProductCTE
5 - Recursive Queries (second example)
In my first article in this series, I covered the new XML capabilities in SQL Server 2005, and presented a reusable table-valued UDF to convert an XML string into a table variable. Many developers need to handle stored procedure queries against a variable number of selection keys, and use either XML strings or comma-separated lists. We can use recursive queries to build a reusable table-valued UDF from a CSV.
For example, suppose we have a stored procedure that must query a table based on a variable number of user selections. An end-user selects seven items from a list (though it could be more or less than seven). Instead of sending up an XML string and using the XML2TableVar function we covered in the first article, we want to send up a CSV instead that looks like the following:
( '1, 11, 111, 222, 333, 444, 555' )
Our goal is to create a table variable with those seven key values, so that we can query a table containing those key values. Prior to SQL 2005, we would parse the string, one comma at a time, and retrieve each value between two commas. In SQL 2005, we can utilize recursive queries to build a list of all the comma positions in the table, and then query the CSV string for each value within the column position pairings.
CREATE FUNCTION [dbo].[CSVToTable]
(@CSVList varchar(8000) )
RETURNS
@tPKList TABLE ( IntPK int )
AS
BEGIN
WITH CSVCte (StartPos, EndPos) AS
( SELECT 1 AS StartPos, CHARINDEX(',' , @CSVList + ',') AS EndPos
UNION ALL
SELECT EndPos + 1 AS StartPos , CHARINDEX(',',@CSVList + ',' , EndPos + 1) AS EndPos FROM CSVCTE
WHERE CHARINDEX(',', @CSVList + ',', EndPos + 1) <> 0)
INSERT INTO @tPKList
SELECT SUBSTRING(@CSVList, StartPos,EndPos-StartPos) FROM CSVCte
RETURN
END
SELECT * FROM dbo.CSVToTable('1, 11, 111, 222, 333, 444, 555')
To do this, first we create the CTE (called CSVCTE), which will store all the comma positions in the string.
WITH CSVCte (StartPos, EndPos) AS
The main query (anchor query) will retrieve the first comma position pair by using CHARINDEX.
( SELECT 1 AS StartPos, CHARINDEX(',' , @CSVList + ',') AS EndPos
The recursive query will pull all subsequent comma positions, by incrementing the End Position by 1 each time, until all comma positions have been found.
UNION ALL
SELECT EndPos + 1 AS StartPos , CHARINDEX(',',@CSVList + ',' , EndPos + 1) AS EndPos FROM CSVCTE
WHERE CHARINDEX(',', @CSVList + ',', EndPos + 1) <> 0)
The final result of the CSVCTE CTE is the following start position/end position pairs:
1 2
3 6
7 11
12 16
17 21
22 26
27 31
Finally, we can use the SUBSTRING function to query each pair combination in the CTE against the original string, to retrieve each substring based on each start/end position:
INSERT INTO @tPKList
SELECT SUBSTRING(@CSVList, StartPos,EndPos-StartPos) FROM CSVCte
Hopefully this second example of recursive queries gives you some insight into their power. Let's take a look at one final example that can help solve a common reporting issue.
5 - Recursive Queries (third example)
Recently I had to build a result set of weekly sales orders. Even if a week did not have any orders, I still needed to include a row for each week-ending date, regardless of whether any sales activity occured. Before SQL Server 2005, I would build a temporary table with a row for each week ending date, and then perform an outer join between that temporary table and my table of orders. Let's take a look at a version of the code using recursive queries.
Before we start, let's take a look at our result set requirements. We want to show a row for each week-ending date: our sales weeks are defined as Sunday through Saturday. Since an order can occur any day of the week, we need to summarize the orders by the week-ending date. So we need to take a little detour first, by writing a function to receive a date, and return the corresponding week-ending date (Saturday). The function below (GetEndOfWeek) will take a date and use the SQL DatePart function to determine the date's day of the week, and add the necessary number of additional days to make it a Saturday date.
CREATE FUNCTION dbo.GetEndOfWeek
(@dDate DateTime)
-- Converts date to the Saturday date for the week
RETURNS DateTime AS
BEGIN
DECLARE @dRetDate DateTime
SET @dRetDate =
@dDate + ( 7-DATEPART(WeekDay, @dDate))
RETURN @dRetDate
END
Next, we need to declare some start and end dates, and create some sample data. Note that we want to run the result set for the first quarter of 2007, but need to adjust the dates accordingly:
DECLARE @StartDate DATETIME, @EndDate DATETIME
SET @StartDate = dbo.GetEndOfWeek('1-1-2007') -- First Sunday
SET @EndDate = dbo.GetEndOfWeek('3-31-2007') -- Last Saturday
DECLARE @Orders TABLE (OrderDate datetime, Amount money)
INSERT INTO @Orders VALUES ('20070201', 800)
INSERT INTO @Orders VALUES ('20070202', 1100)
INSERT INTO @Orders VALUES ('20070302', 2100)
INSERT INTO @Orders VALUES ('20070301', 600)
INSERT INTO @Orders VALUES ('20070303', 4100)
Now we need to build a CTE that creates a date entry for each week ending date in the date range. Our main/anchor query will SELECT the startdate, and the recursive query will select each subsequent week-ending date plus 7 days, until we reach the end date.
;WITH DateCTE(WeekEnding) AS
(SELECT @StartDate AS WeekEnding
UNION ALL
SELECT WeekEnding + 7 AS WeekEnding FROM DateCTE WHERE WeekEnding <= @EndDate )
Finally, we can query the DateCTE againist the Orders table, using an outer join.
SELECT WeekEnding, SUM(COALESCE(Orders.Amount,0)) AS WeeklySales
FROM DateCTE
LEFT JOIN @Orders Orders ON dbo.GetEndOfWeek(Orders.OrderDate) = DateCte.WeekEnding
GROUP BY DateCte.WeekEnding ORDER BY DateCte.WeekEnding
OPTION (MAXRECURSION 1000)
Note the OPTION MAXRECURSION statement: by default, SQL Server 2005 supports a maximum of 100 recursion levels, but we can increase that.
About Kevin Goff
Kevin S. Goff has been a Microsoft C# .NET MVP for the last three years, from 2005 through 2007. He is a regular columnist for CoDe Magazine, for the Baker’s Dozen Productivity Tips. He is also the author of Pro VS 2005 Reporting using SQL Server and Crystal Reports, published by Apress. He is the principal consultant for Common Ground Solutions, and specializes in solutions in .NET, SQL Server, and Crystal Reports. His main site is www.commongroundsolutions.net, and his blog is www.TheBakersDozen.net .
Email this Author!
Related Links
Want to learn more about Microsoft SQL Server 2005?