Take
Five with SQL Server 2005
New T-SQL Capabilities in SQL Server 2005 - Part 3 of 3
Kevin S. Goff, Microsoft MVP 2007
In last month's
article, I covered the new APPLY operator, Ranking functions, and the new
TRY...CATCH capabilties in T-SQL 2005. This month I'll talk about two of
the largest enhancements in the SQL 2005 language - Common Table Expressions
(CTEs) and Recursive Queries.
Both of these are
part of the SQL-99 standard.
CTEs allow developers to more expressively identify a temporary result set
(similar to a derived table or temporary view). Recursive Queries allow
developers to more easily query hierarchies of data, by permitting developers
(in a single SQL statement) to recursively query into a CTE that's also being
populated).
CTEs and Recursive
queries can be a little difficult to initially grasp, so I'm going to show
several code examples to demonstrate their value.
1 - Common Table Expressions (first example)
Suppose
we wanted to find all the orders in the Northwind Orders database, where the
Freight was greater than $500, and the Freight exceeded the order amount. Now,
this is a pretty basic example, but will show how we can use CTEs.
WITH
OrderCTE
(OrderID,
CustomerID,
OrderDate,
Amount,Freight)
AS
(SELECT
OH.OrderID,
CustomerID, OrderDate,
(Unitprice
*
Quantity)
AS
Amount,
Freight
FROM
Orders OH
JOIN [dbo].[Order
Details] OD ON OH.OrderID
= OD.OrderID
WHERE Freight >
500)
SELECT
* FROM OrderCTE
WHERE Freight >
Amount ORDER BY
Freight DESC
In
the query above, we declared a CTE called OrderCTE (which, again,
you can think of as a temporary result set) with the following line:
WITH OrderCTE
(OrderID, CustomerID,
OrderDate, Amount,Freight)
AS
The
remaining part query inside the CTE will populate the CTE with orders greater
than $500. The subsequent statement queries the CTE for orders where the
freight exceeds the amount.
SELECT
* FROM OrderCTE
WHERE Freight >
Amount ORDER BY
Freight DESC
We
certainly could have done this all with one query. But some would argue
that the CTE approach is a little more readable. This becomes especially
true with more complicated derived table queries.
One
thing to note: the CTE has a very short shelf-life. How short?...only for
the statement following the creation of the CTE. But if you want to break
up complicated derived table statements for readability, CTEs are very helpful.
2
- Common Table Expressions (second example)
Suppose
we wanted to run the same query for only those customers outside the United
States that have at least 20 Orders. We can build 2 CTEs as follows, one for a
list of Customers (CustomerCTE) and the other for Orders (OrderCTE), and then
join them at the end:
WITH CustomerCTE
(CustomerID, CompanyName,OrderCount)
AS
(SELECT CustomerID,
CompanyName,
(SELECT
COUNT(*) FROM
Orders WHERE Orders.CustomerID
= Customers.CustomerID)
AS OrderCount
FROM Customers WHERE
Country <> 'USA'),
OrderCTE
(OrderID, CustomerID,
OrderDate, Amount,Freight)
AS
(SELECT
OH.OrderID, CustomerID,
OrderDate, (Unitprice
* Quantity)
AS Amount, Freight
FROM Orders OH
JOIN [dbo].[Order
Details] OD ON OH.OrderID
= OD.OrderID
WHERE Freight >
500)
SELECT CustomerCTE.CustomerID,
CompanyName, OrderID,
OrderDate, Amount,
Freight, OrderCount
FROM OrderCTE JOIN
CustomerCTE ON CustomerCTE.CustomerID
= OrderCTE.CustomerID
WHERE
Freight > Amount
AND OrderCount >= 20
ORDER BY Freight
DESC
In
the example above, I've created a 2nd CTE to retrieve the customers outside the
US. The CTE includes the order count for each customer:
WITH CustomerCTE
(CustomerID, CompanyName,OrderCount)
AS
(SELECT CustomerID,
CompanyName,
(SELECT
COUNT(*) FROM
Orders WHERE Orders.CustomerID
= Customers.CustomerID)
AS OrderCount
FROM Customers WHERE
Country <> 'USA'),
At
the end, we've created two CTEs that we can query for the Orders where the
freight exceeds the amount, and the order count is greater than 20.
SELECT CustomerCTE.CustomerID,
CompanyName, OrderID,
OrderDate, Amount,
Freight, OrderCount
FROM OrderCTE JOIN
CustomerCTE ON CustomerCTE.CustomerID
= OrderCTE.CustomerID
WHERE
Freight > Amount
AND OrderCount >= 20
ORDER BY Freight
DESC
3
- Recursive Queries (first example)
The
first two examples used the basic capabilties of Common Table
Expressions. Now we'll expand our use of CTEs to cover recursive queries.
Let's
start with a simple but meaningful example for recursively querying data:
suppose we have a Product table that stores hierarchical data, with ParentID
pointers to the parent level for each record:
DECLARE @tProducts
TABLE (ID
int, Name
char(50),
ParentID int)
INSERT
INTO @tProducts VALUES
(1, 'Brand 1',
null)
INSERT
INTO @tProducts VALUES
(2,
'Brand 2', null)
INSERT
INTO @tProducts VALUES
(3,
'Brand 3', null)
INSERT
INTO @tProducts VALUES
(6, 'Brand 1,
Group 1', 1)
INSERT
INTO @tProducts VALUES
(7,
'Brand 1, Group 2', 1)
INSERT
INTO @tProducts VALUES
(8,
'Brand 1, Group 3', 1)
INSERT
INTO @tProducts VALUES
( 9, 'Brand 2,
Group 1', 2)
INSERT
INTO @tProducts VALUES
(10,
'Brand 2, Group 2', 2)
INSERT
INTO @tProducts VALUES
( 11, 'Brand
3, Group 3', 3)
INSERT
INTO @tProducts VALUES
(12, 'Brand 1,
Group 1, Item 1', 6)
INSERT
INTO @tProducts VALUES
(13,
'Brand 1, Group 1, Item 2', 6)
INSERT
INTO @tProducts
VALUES (14,
'Brand 1, Group 1, Item 1, SKU 1',
12)
The
table above stores a product hierarchy (Brand, Group, Item, and SKU), with one
or more rows at each level, and a ParentID reference to the corresponding
parent level.
Now
suppose we want to query the table for a particular single record, and also
retrieve all the parent records. To do so, we need two queries: a query
for a single record, and then a query that retrieves the parent records for the
single record, and then the grandparent records for those parent records,
etc. The full syntax for such a query in SQL 2005 is as follows:
DECLARE @cSearch
char(50)
SET
@cSearch = 'Brand 1,
Group 1'
;
WITH ProductCTE (ID, Name, ParentID) AS
-- Anchor query
(SELECT ID,
Name, ParentID
FROM @tProducts
WHERE Name
= @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID,
Prod.Name,
Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID
= Prod.parentID
)
SELECT
* FROM ProductCTE
Let's
break this query down - there are essentially three pieces to it:
First,
we define the CTE:
;
WITH ProductCTE (ID, Name, ParentID) AS
Second,
we perform the main (or "anchor") query, to retrieve the specific row
(for example, "Brand 1, Group 1")
(SELECT
ID, Name,
ParentID
FROM @tProducts
WHERE Name
= @cSearch
Finally
(and perhaps most important), we add our recursive query to the anchor query
(with a UNION ALL statement). The recursive query allows us to query
against the ProductCTE while we also query against the ProductCTE and
the source table for records where the ProductCTE.ID equals the ParentID
column of the source table. This process continues until all searches are
exhausted.
UNION
ALL
-- Recursive query
SELECT Prod.ID,
Prod.Name,
Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID
= Prod.parentID
)
This
will give us a result of the following:
ID
Name
ParentID
6 Brand 1,
Group
1
1
12 Brand 1,
Group 1, Item
1
6
13 Brand 1,
Group 1, Item
2
6
14 Brand 1,
Group 1, Item 1, SKU
1
12
Before
we move on to the next example - suppose that instead of querying for all the
parent records, we wanted to query to retrieve all the child records
instead? The answer is simple: we reverse the queries on ID and ParentID,
as follows:
SET @cSearch
= 'Brand 1, Group 1, Item
2'
;
WITH ProductCTE AS
-- Anchor query
(SELECT ID,
Name,
ParentID
--
Retrieve the ParentID
FROM @tProducts
WHERE Name
= @cSearch
UNION
ALL
-- Recursive query
SELECT Prod.ID,
Prod.Name,
Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON
ProductCTE.ParentID
=
Prod.ID
)
-- reverse the join keys
SELECT
* FROM ProductCTE
5
- Recursive Queries (second example)
In
my first article in this series, I covered the new XML capabilities in SQL
Server 2005, and presented a reusable table-valued UDF to convert an XML string
into a table variable. Many developers need to handle stored procedure
queries against a variable number of selection keys, and use either XML strings
or comma-separated lists. We can use recursive queries to build a
reusable table-valued UDF from a CSV.
For
example, suppose we have a stored procedure that must query a table based on a
variable number of user selections. An end-user selects seven items from
a list (though it could be more or less than seven). Instead of sending
up an XML string and using the XML2TableVar function we covered in the
first article, we want to send up a CSV instead that looks like the following:
(
'1, 11, 111, 222, 333, 444, 555'
)
Our
goal is to create a table variable with those seven key values, so that we can
query a table containing those key values. Prior to SQL 2005, we would parse
the string, one comma at a time, and retrieve each value between two
commas. In SQL 2005, we can utilize recursive queries to build a list of
all the comma positions in the table, and then query the CSV string for each
value within the column position pairings.
CREATE
FUNCTION [dbo].[CSVToTable]
(@CSVList varchar(8000)
)
RETURNS
@tPKList
TABLE ( IntPK
int )
AS
BEGIN
WITH
CSVCte (StartPos,
EndPos) AS
( SELECT 1
AS StartPos,
CHARINDEX(','
, @CSVList +
',') AS
EndPos
UNION ALL
SELECT EndPos +
1 AS StartPos ,
CHARINDEX(',',@CSVList
+ ','
, EndPos + 1)
AS EndPos FROM
CSVCTE
WHERE CHARINDEX(',',
@CSVList + ',',
EndPos + 1)
<> 0)
INSERT
INTO @tPKList
SELECT SUBSTRING(@CSVList,
StartPos,EndPos-StartPos)
FROM CSVCte
RETURN
END
SELECT
* FROM dbo.CSVToTable('1,
11, 111, 222, 333, 444, 555')
To
do this, first we create the CTE (called CSVCTE), which will store all
the comma positions in the string.
WITH CSVCte (StartPos,
EndPos) AS
The
main query (anchor query) will retrieve the first comma position pair by using
CHARINDEX.
( SELECT 1
AS StartPos,
CHARINDEX(','
, @CSVList +
',') AS
EndPos
The
recursive query will pull all subsequent comma positions, by incrementing the
End Position by 1 each time, until all comma positions have been found.
UNION
ALL
SELECT EndPos + 1
AS StartPos ,
CHARINDEX(',',@CSVList
+ ','
, EndPos + 1)
AS EndPos FROM
CSVCTE
WHERE CHARINDEX(',',
@CSVList + ',',
EndPos + 1)
<> 0)
The
final result of the CSVCTE CTE is the following start position/end
position pairs:
1 2
3 6
7 11
12 16
17 21
22 26
27 31
Finally,
we can use the SUBSTRING function to query each pair combination in the
CTE against the original string, to retrieve each substring based on each
start/end position:
INSERT
INTO @tPKList
SELECT SUBSTRING(@CSVList,
StartPos,EndPos-StartPos)
FROM CSVCte
Hopefully
this second example of recursive queries gives you some insight into their
power. Let's take a look at one final example that can help solve a common
reporting issue.
5
- Recursive Queries (third example)
Recently
I had to build a result set of weekly sales orders. Even if a week did not have
any orders, I still needed to include a row for each week-ending date,
regardless of whether any sales activity occured. Before SQL Server 2005,
I would build a temporary table with a row for each week ending date, and then
perform an outer join between that temporary table and my table of
orders. Let's take a look at a version of the code using recursive
queries.
Before
we start, let's take a look at our result set requirements. We want
to show a row for each week-ending date: our sales weeks are defined as Sunday
through Saturday. Since an order can occur any day of the week, we need
to summarize the orders by the week-ending date. So we need to take a
little detour first, by writing a function to receive a date, and return the
corresponding week-ending date (Saturday). The function below (GetEndOfWeek)
will take a date and use the SQL DatePart function to determine the date's day
of the week, and add the necessary number of additional days to make it a
Saturday date.
CREATE
FUNCTION dbo.GetEndOfWeek
(@dDate
DateTime)
-- Converts date to the Saturday date for the week
RETURNS
DateTime AS
BEGIN
DECLARE @dRetDate DateTime
SET @dRetDate =
@dDate + ( 7-DATEPART(WeekDay,
@dDate))
RETURN
@dRetDate
END
Next,
we need to declare some start and end dates, and create some sample data.
Note that we want to run the result set for the first quarter of 2007, but need
to adjust the dates accordingly:
DECLARE @StartDate
DATETIME, @EndDate
DATETIME
SET
@StartDate = dbo.GetEndOfWeek('1-1-2007')
-- First Sunday
SET
@EndDate = dbo.GetEndOfWeek('3-31-2007')
-- Last Saturday
DECLARE @Orders
TABLE (OrderDate
datetime, Amount
money)
INSERT
INTO @Orders VALUES
('20070201',
800)
INSERT
INTO @Orders VALUES
('20070202',
1100)
INSERT
INTO @Orders VALUES
('20070302',
2100)
INSERT
INTO @Orders VALUES
('20070301',
600)
INSERT
INTO @Orders VALUES
('20070303',
4100)
Now
we need to build a CTE that creates a date entry for each week ending date in
the date range. Our main/anchor query will SELECT the startdate, and the
recursive query will select each subsequent week-ending date plus 7 days, until
we reach the end date.
;WITH
DateCTE(WeekEnding)
AS
(SELECT @StartDate
AS WeekEnding
UNION ALL
SELECT WeekEnding +
7 AS WeekEnding FROM
DateCTE WHERE WeekEnding
<= @EndDate )
Finally,
we can query the DateCTE againist the Orders table, using an outer join.
SELECT WeekEnding,
SUM(COALESCE(Orders.Amount,0))
AS WeeklySales
FROM DateCTE
LEFT JOIN @Orders
Orders ON dbo.GetEndOfWeek(Orders.OrderDate)
= DateCte.WeekEnding
GROUP BY DateCte.WeekEnding
ORDER BY DateCte.WeekEnding
OPTION
(MAXRECURSION
1000)
Note
the OPTION MAXRECURSION statement: by default, SQL Server 2005 supports a
maximum of 100 recursion levels, but we can increase that.
About the author:
Kevin S. Goff has been a Microsoft C# .NET MVP for the last three years, from
2005 through 2007. He is a regular columnist for CoDe Magazine,
for the Baker’s Dozen Productivity Tips. He is also the author
of Pro VS 2005 Reporting using SQL Server and Crystal Reports,
published by Apress. He is the principal consultant for Common Ground
Solutions, and specializes in solutions in .NET, SQL Server, and Crystal
Reports. His main site is www.commongroundsolutions.net, and his blog is
www.TheBakersDozen.net
.
Email this Author!
Related Links
|