Take
Five with SQL Server 2005
New T-SQL Capabilities in SQL Server 2005 - Part 3 of 3
Kevin S. Goff, Microsoft MVP 2007
In last month's article,
I covered the new APPLY operator, Ranking functions, and the new TRY...CATCH capabilties
in T-SQL 2005. This month I'll talk about two of the largest enhancements
in the SQL 2005 language - Common Table Expressions (CTEs) and Recursive Queries.
Both of these
are part of the SQL-99 standard.
CTEs allow developers to more expressively identify a temporary result set (similar
to a derived table or temporary view). Recursive Queries allow developers
to more easily query hierarchies of data, by permitting developers (in a single
SQL statement) to recursively query into a CTE that's also being populated).
CTEs and Recursive queries
can be a little difficult to initially grasp, so I'm going to show several code
examples to demonstrate their value.
1 - Common Table Expressions (first example)
Suppose we wanted to find
all the orders in the Northwind Orders database, where the Freight was greater than
$500, and the Freight exceeded the order amount. Now, this is a pretty basic example,
but will show how we can use CTEs.
WITH OrderCTE
(OrderID, CustomerID,
OrderDate, Amount,Freight) AS
(SELECT OH.OrderID, CustomerID,
OrderDate, (Unitprice
* Quantity) AS Amount, Freight
FROM Orders OH
JOIN [dbo].[Order
Details] OD ON OH.OrderID
= OD.OrderID
WHERE Freight >
500)
SELECT *
FROM OrderCTE WHERE
Freight > Amount ORDER
BY Freight DESC
In the query above, we
declared a CTE called OrderCTE (which, again, you can think of as a
temporary result set) with the following line:
WITH OrderCTE
(OrderID, CustomerID, OrderDate, Amount,Freight)
AS
The remaining part query
inside the CTE will populate the CTE with orders greater than $500. The subsequent
statement queries the CTE for orders where the freight exceeds the amount.
SELECT *
FROM OrderCTE WHERE
Freight > Amount ORDER
BY Freight DESC
We certainly could have
done this all with one query. But some would argue that the CTE approach is
a little more readable. This becomes especially true with more complicated
derived table queries.
One thing to note: the
CTE has a very short shelf-life. How short?...only for the statement following
the creation of the CTE. But if you want to break up complicated derived table
statements for readability, CTEs are very helpful.
2 - Common Table
Expressions (second example)
Suppose we wanted to run
the same query for only those customers outside the United States that have at least
20 Orders. We can build 2 CTEs as follows, one for a list of Customers (CustomerCTE)
and the other for Orders (OrderCTE), and then join them at the end:
WITH CustomerCTE
(CustomerID, CompanyName,OrderCount)
AS
(SELECT CustomerID, CompanyName,
(SELECT
COUNT(*) FROM
Orders WHERE Orders.CustomerID
= Customers.CustomerID) AS OrderCount
FROM Customers WHERE
Country <> 'USA'),
OrderCTE
(OrderID, CustomerID,
OrderDate, Amount,Freight) AS
(SELECT OH.OrderID, CustomerID,
OrderDate, (Unitprice
* Quantity) AS Amount, Freight
FROM Orders OH
JOIN [dbo].[Order
Details] OD ON OH.OrderID
= OD.OrderID
WHERE Freight >
500)
SELECT CustomerCTE.CustomerID, CompanyName, OrderID, OrderDate, Amount, Freight, OrderCount
FROM OrderCTE JOIN
CustomerCTE ON CustomerCTE.CustomerID
= OrderCTE.CustomerID
WHERE
Freight > Amount
AND OrderCount >= 20
ORDER BY Freight DESC
In the example above,
I've created a 2nd CTE to retrieve the customers outside the US. The CTE includes
the order count for each customer:
WITH CustomerCTE
(CustomerID, CompanyName,OrderCount)
AS
(SELECT CustomerID, CompanyName,
(SELECT
COUNT(*) FROM
Orders WHERE Orders.CustomerID
= Customers.CustomerID) AS OrderCount
FROM Customers WHERE
Country <> 'USA'),
At the end, we've created
two CTEs that we can query for the Orders where the freight exceeds the amount,
and the order count is greater than 20.
SELECT CustomerCTE.CustomerID, CompanyName, OrderID, OrderDate, Amount, Freight, OrderCount
FROM OrderCTE JOIN
CustomerCTE ON CustomerCTE.CustomerID
= OrderCTE.CustomerID
WHERE
Freight > Amount
AND OrderCount >= 20
ORDER BY Freight DESC
3 - Recursive
Queries (first example)
The first two examples
used the basic capabilties of Common Table Expressions. Now we'll expand our
use of CTEs to cover recursive queries.
Let's start with a simple
but meaningful example for recursively querying data: suppose we have a Product
table that stores hierarchical data, with ParentID pointers to the parent level
for each record:
DECLARE @tProducts
TABLE (ID int,
Name char(50), ParentID int)
INSERT INTO @tProducts VALUES
(1,
'Brand 1', null)
INSERT INTO @tProducts
VALUES (2, 'Brand 2',
null)
INSERT INTO @tProducts
VALUES (3, 'Brand 3',
null)
INSERT INTO @tProducts VALUES
(6,
'Brand 1, Group 1', 1)
INSERT INTO @tProducts
VALUES (7, 'Brand 1, Group 2', 1)
INSERT INTO @tProducts
VALUES (8, 'Brand 1, Group 3', 1)
INSERT INTO @tProducts VALUES
( 9,
'Brand 2, Group 1', 2)
INSERT INTO @tProducts
VALUES (10, 'Brand 2, Group 2', 2)
INSERT INTO @tProducts VALUES
( 11,
'Brand 3, Group 3', 3)
INSERT INTO @tProducts VALUES
(12,
'Brand 1, Group 1, Item 1', 6)
INSERT INTO @tProducts
VALUES (13, 'Brand 1, Group 1, Item 2', 6)
INSERT INTO @tProducts
VALUES (14, 'Brand 1, Group 1, Item 1, SKU
1', 12)
The table above stores
a product hierarchy (Brand, Group, Item, and SKU), with one or more rows at each
level, and a ParentID reference to the corresponding parent level.
Now suppose we want to
query the table for a particular single record, and also retrieve all the parent
records. To do so, we need two queries: a query for a single record, and then
a query that retrieves the parent records for the single record, and then the grandparent
records for those parent records, etc. The full syntax for such a query in
SQL 2005 is as follows:
DECLARE @cSearch
char(50)
SET @cSearch = 'Brand 1, Group 1'
;
WITH ProductCTE (ID, Name, ParentID) AS
-- Anchor query
(SELECT ID,
Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID
= Prod.parentID
)
SELECT * FROM ProductCTE
Let's break this query
down - there are essentially three pieces to it:
First, we define the CTE:
;
WITH ProductCTE (ID, Name, ParentID) AS
Second, we perform the main (or
"anchor") query, to retrieve the specific row (for example, "Brand 1, Group
1")
(SELECT
ID, Name,
ParentID
FROM @tProducts
WHERE Name = @cSearch
Finally (and perhaps most important),
we add our recursive query to the anchor query (with a UNION ALL statement).
The recursive query allows us to query against the ProductCTE while we also query
against the ProductCTE and the source table for records where the ProductCTE.ID
equals the ParentID column of the source table. This process continues
until all searches are exhausted.
UNION
ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID
= Prod.parentID
)
This will give us a result of
the following:
ID Name
ParentID
6 Brand 1, Group
1
1
12
Brand 1, Group 1, Item 1
6
13
Brand 1, Group 1, Item 2
6
14
Brand 1, Group 1, Item 1, SKU 1
12
Before we move on to the next
example - suppose that instead of querying for all the parent records, we wanted
to query to retrieve all the child records instead? The answer is simple:
we reverse the queries on ID and ParentID, as follows:
SET @cSearch = 'Brand 1, Group 1, Item 2'
;
WITH ProductCTE AS
-- Anchor query
(SELECT ID,
Name,
ParentID
-- Retrieve the ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION
ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON
ProductCTE.ParentID
= Prod.ID ) -- reverse the join
keys
SELECT * FROM ProductCTE
5 - Recursive
Queries (second example)
In my first article
in this series, I covered the new XML capabilities in SQL Server 2005, and presented
a reusable table-valued UDF to convert an XML string into a table variable.
Many developers need to handle stored procedure queries against a variable number
of selection keys, and use either XML strings or comma-separated lists. We
can use recursive queries to build a reusable table-valued UDF from a CSV.
For example, suppose we
have a stored procedure that must query a table based on a variable number of user
selections. An end-user selects seven items from a list (though it could be
more or less than seven). Instead of sending up an XML string and using the
XML2TableVar function we covered in the first article, we want to send up
a CSV instead that looks like the following:
( '1, 11, 111, 222, 333, 444, 555'
)
Our goal is to create
a table variable with those seven key values, so that we can query a table containing
those key values. Prior to SQL 2005, we would parse the string, one comma at a time,
and retrieve each value between two commas. In SQL 2005, we can utilize recursive
queries to build a list of all the comma positions in the table, and then query
the CSV string for each value within the column position pairings.
CREATE FUNCTION [dbo].[CSVToTable]
(@CSVList varchar(8000) )
RETURNS
@tPKList TABLE ( IntPK
int )
AS
BEGIN
WITH
CSVCte (StartPos,
EndPos) AS
( SELECT 1 AS StartPos,
CHARINDEX(','
, @CSVList + ',')
AS EndPos
UNION ALL
SELECT EndPos +
1 AS StartPos ,
CHARINDEX(',',@CSVList
+ ',' ,
EndPos + 1) AS EndPos FROM CSVCTE
WHERE CHARINDEX(',',
@CSVList + ',', EndPos + 1)
<> 0)
INSERT
INTO @tPKList
SELECT SUBSTRING(@CSVList, StartPos,EndPos-StartPos) FROM CSVCte
RETURN
END
SELECT * FROM dbo.CSVToTable('1, 11, 111, 222, 333, 444, 555')
To do this, first
we create the CTE (called CSVCTE), which will store all the comma positions
in the string.
WITH CSVCte (StartPos, EndPos)
AS
The main query (anchor
query) will retrieve the first comma position pair by using CHARINDEX.
( SELECT 1 AS StartPos,
CHARINDEX(','
, @CSVList + ',')
AS EndPos
The recursive query
will pull all subsequent comma positions, by incrementing the End Position by 1
each time, until all comma positions have been found.
UNION ALL
SELECT EndPos + 1
AS StartPos , CHARINDEX(',',@CSVList
+ ','
, EndPos + 1)
AS EndPos FROM
CSVCTE
WHERE CHARINDEX(',', @CSVList
+ ',',
EndPos + 1) <> 0)
The final result of
the CSVCTE CTE is the following start position/end position pairs:
1 2
3
6
7
11
12
16
17
21
22
26
27
31
Finally, we can use
the SUBSTRING function to query each pair combination in the CTE against
the original string, to retrieve each substring based on each start/end position:
INSERT
INTO @tPKList
SELECT SUBSTRING(@CSVList, StartPos,EndPos-StartPos) FROM CSVCte
Hopefully this second
example of recursive queries gives you some insight into their power. Let's take
a look at one final example that can help solve a common reporting issue.
5 - Recursive
Queries (third example)
Recently I had to build
a result set of weekly sales orders. Even if a week did not have any orders, I still
needed to include a row for each week-ending date, regardless of whether any sales
activity occured. Before SQL Server 2005, I would build a temporary table
with a row for each week ending date, and then perform an outer join between that
temporary table and my table of orders. Let's take a look at a version of
the code using recursive queries.
Before we start, let's
take a look at our result set requirements. We want to show a row for
each week-ending date: our sales weeks are defined as Sunday through Saturday.
Since an order can occur any day of the week, we need to summarize the orders by
the week-ending date. So we need to take a little detour first, by writing
a function to receive a date, and return the corresponding week-ending date (Saturday).
The function below (GetEndOfWeek) will take a date and use the SQL DatePart
function to determine the date's day of the week, and add the necessary number of
additional days to make it a Saturday date.
CREATE FUNCTION dbo.GetEndOfWeek
(@dDate
DateTime)
-- Converts date to the Saturday date
for the week
RETURNS DateTime AS
BEGIN
DECLARE @dRetDate DateTime
SET @dRetDate =
@dDate + ( 7-DATEPART(WeekDay, @dDate))
RETURN
@dRetDate
END
Next, we need to declare
some start and end dates, and create some sample data. Note that we want to
run the result set for the first quarter of 2007, but need to adjust the dates accordingly:
DECLARE @StartDate
DATETIME, @EndDate
DATETIME
SET @StartDate =
dbo.GetEndOfWeek('1-1-2007')
-- First Sunday
SET @EndDate = dbo.GetEndOfWeek('3-31-2007') -- Last Saturday
DECLARE @Orders
TABLE (OrderDate
datetime, Amount
money)
INSERT INTO @Orders
VALUES ('20070201', 800)
INSERT INTO @Orders
VALUES ('20070202', 1100)
INSERT INTO @Orders
VALUES ('20070302', 2100)
INSERT INTO @Orders
VALUES ('20070301', 600)
INSERT INTO @Orders
VALUES ('20070303', 4100)
Now we need to build a
CTE that creates a date entry for each week ending date in the date range.
Our main/anchor query will SELECT the startdate, and the recursive query will select
each subsequent week-ending date plus 7 days, until we reach the end date.
;WITH DateCTE(WeekEnding) AS
(SELECT @StartDate
AS WeekEnding
UNION ALL
SELECT WeekEnding +
7 AS WeekEnding FROM
DateCTE WHERE WeekEnding
<= @EndDate )
Finally, we can query
the DateCTE againist the Orders table, using an outer join.
SELECT WeekEnding, SUM(COALESCE(Orders.Amount,0))
AS WeeklySales
FROM DateCTE
LEFT JOIN @Orders
Orders ON dbo.GetEndOfWeek(Orders.OrderDate) = DateCte.WeekEnding
GROUP BY DateCte.WeekEnding ORDER
BY DateCte.WeekEnding
OPTION (MAXRECURSION
1000)
Note the OPTION MAXRECURSION
statement: by default, SQL Server 2005 supports a maximum of 100 recursion levels,
but we can increase that.
About the author:
Kevin S. Goff has been a Microsoft C# .NET MVP for the last three years, from 2005
through 2007. He is a regular columnist for CoDe Magazine, for
the Baker’s Dozen Productivity Tips. He is also the author of
Pro VS 2005 Reporting using SQL Server and Crystal Reports, published
by Apress. He is the principal consultant for Common Ground Solutions, and specializes
in solutions in .NET, SQL Server, and Crystal Reports. His main site is www.commongroundsolutions.net,
and his blog is www.TheBakersDozen.net
.
Email this Author!
Related Links