Creating CLR Functions for SQL Server, Page 2
Back to page 1
Loading a .NET Assembly into SQL Server
Before we can run our new functions, we must load the assembly into SQL Server and
define the correspondence between a TSQL function name and a CLR function we have
loaded. You will need to have administrative priviledge to set load and run CLR
code on SQL Server.
Configuring the Server
SQL Server 2005 is not configured to use CLR code by default. You will have to explicitly
configure the server to use the CLR. This is easily done in TSQL by calling the
system procedure sp_configure.
sp_configure 'clr enabled',
1
RECONFIGURE WITH OVERRIDE
(You are probably already aware that you can run sp_configure with only a partial
completion of the first argument, as long as what you have typed is sufficient to
uniquely identify the configuration property.)
Now we use the CREATE ASSEMBLY statement to tell SQL Server to copy the assembly
into its system tables. Be sure that Management Studio query window is pointing
to the database you want (e.g. Northwind) and run the following TSQL:
CREATE ASSEMBLY regX
FROM 'C:\KnowledgeBase\SqlStuff\SQL2005\CLR\UDF\TextFunctions\TextFunctions\bin\Debug\RegEx.dll'
It would be a remarkable coincidence indeed if your path were the same as this one;
you might wish to change it.
After the CREATE ASSEMBLY statement is run, the executable code is safely stored
in SQL Server's own system tables, away from access by any hacker. If you like,
you can confirm this by deleting the dll file. The code will run from SQL Server
anyway! If you wish to recompile the .NET assembly and reinstall it into SQL Server,
you must either drop it or change the CREATE to ALTER, just as with TSQL statements.
The final step for the installation of our CLR code is telling SQL Server how to
match up a Transact SQL request with a CLR function. We do this with a CREATE FUNCTION
statement (or CREATE PROCEDURE as appropriate). However, unlike the usual CREATE
FUNCTION statement, there is no TSQL code. There is only the EXTERNAL NAME reference
to the function. Note that the function name is fully qualified, that is, assemblyName.ClassName.FunctionName.
It is also important to be aware that the EXTERNAL NAME specification is case-sensitive!
-- The external name is case sensitive
CREATE Function RegExMatch(@Input NVARCHAR(512),@Pattern
NVARCHAR(127))
RETURNS BIT EXTERNAL NAME regX.RegularExpressions.RegExMatch
GO
CREATE Function RegExReplace(@Input NVARCHAR(512),@Pattern
NVARCHAR(127), @Replacement NVARCHAR(512))
RETURNS NVARCHAR(512) EXTERNAL NAME
regX.RegularExpressions.RegExReplace
Note that the assembly name in the CREATE FUNCTION statement is the name you gave
it when you loaded the assembly into SQL Server, not the name of the dll file, which
is no longer of any concern to SQL Server. The TSQL function name need not be the
same as the CLR function, but I believe it is less confusing if they are the same.
My choice of character size for the NVARCHAR declarations is arbitrary; you might
feel some other size is more appropriate. Note that the RegExMatch function returns
type BIT.
The time has come to test our creation. I suspect you are anxious to get on with
it, and have no patience for a regular expression tutorial. So we'll make testing
the functions our intro.
Here are few SELECT statements which operate on literal strings:
SELECT dbo.RegExMatch('Fred','.d')
SELECT dbo.RegExMatch('Fred','^.d')
SELECT dbo.RegExMatch('Fred','\d')
SELECT dbo.RegExMatch('Fr3d','\d')
In the first example, '.d' means the letter 'd' after any character. The period
matches any character. The caret in the second example matches the beginning of
a line of text. '^.d' therefore only matches 'd' if it is the second letter. In
the next two examples, the 'd' is preceded by a backslash. This makes it one of
the many shortcuts in regular expressions. '\d' matches any digit, i.e. 0-9. '\D'
matches anything which is not a digit.
Here are some query examples which use our RegExMatch() function in a WHERE clause
to extract matching row data from the Northwind Customers table:
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '^.{1,3}n'
)=1
-- finds CompanyName where the second, third, or fourth letter is
an 'n'
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sa'
)=1
-- find lower-case letter a after a whitespace character (space,
tab, etc)
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sA'
)=1
-- find upper-case letter A after a whitespace character
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '(cat|kat)'
)=1
-- find either 'cat' or kat'
SELECT * FROM Customers WHERE dbo.RegExMatch(Address, '\d'
)=0
-- find addresses which do not contain any numerals at all
Now let's have a look at RegExReplace. We will pick a slightly more complex example.
We would like to query phone numbers from the Customers table of the Northwind sample
database, but we do not want to see either parentheses or hyphens. Parentheses and
hyphens, however, are special charcters in regular expressions. We will have to
escape them with backslashes.
(\(|\)|\-)
The vertical pipe symbol represents OR in regular expression logic. This peculiar-looking
expression can be read \( OR \) OR \-. If you prefer, an open paren OR a close paren
OR a hyphen. The entire OR series is enclosed in parentheses. Now we can write our
query:
SELECT CompanyName, Phone, dbo.RegExReplace(Phone,'(\(|\)|\-)' ,'' )
FROM Customers
WHERE Phone IS NOT NULL
The WHERE clause in this query is necessary since the Regex object will choke on
a NULL. Throwing an exception for an improperly formed regular expression is OK,
since it is a problem the user will need to fix. But a well-designed function should
test for nulls rather than throw an exception. That's your new assignment!
Source Code
The source code for all of the examples above
can be found here.
About the author:
Dan Buskirk had been a research scientist for many years when he left the university
labs to participate in a startup venture to design and build databases for medical
science. Since that time he has managed a consulting practice specializing in database
design and development. Dan balances this with training on Microsoft SQL Server,
Microsoft Analysis Services, and .NET programming. His interests include mathematical
methods for data mining and computing methods for advanced data categorization on
Windows clusters.