Creating CLR Functions for SQL Server

Posted by Dan Buskirk on 05/10/07 | DotNet

One of the most exciting new features of MS SQL Server 2005 is its ability to host the .NET Common Language Runtime (CLR). This feature was not, however, designed merely to provide an alternative to Transact SQL (TSQL). In any development project, it is important to use the right tool for the right job. If you want to create a stored procedure that performs standard operations on relational data, then without doubt TSQL is the platform to choose. Since TSQL is designed solely for the purpose of manipulating relational data, it is superb at that job. However, there are many tasks which fall outside of the realm of relational data. It is for these tasks that CLR code might be a wise choice. Such tasks might include writing log information directly to disk or gathering data from a web service.

Requirements for Creating SQL Server Functions and Stored Procedures in a .NET Language.

You must use .NET 2.0 or later. Certain memory management features of the 2.0 CLR were specifically created for use by SQL Server. Previous versions simply won't work.

Starting Our CLR Function for SQL Server

Sorry, we are not going to create HelloWorld.dll. We want to create a function which exploits the power of the CLR to do something TSQL simply cannot, and if we can provide a practical tool in the process, all the better. For starters, we will create a pair of CLR functions which will make regular expressions available within TSQL. Regular expressions invaluable for string pattern matching, and prove useful in many database query tasks.

Regular Expressions in .NET

The primary .NET class of interest to us is the Regex class, which is defined in the System.Text.RegularExpressions namespace. We will use two important members, IsMatch and Replace. Ismatch returns True if the input string matches the regular expression pattern. Replace, as its name suggests, replaces the matching portion of a string with some other string. We will not discuss the regular expression patterns just yet; that will be more fun once our SQL Server function is running.

Creating the Visual Studio Project

If you are using Visual Studio 2005, life is good. Create a new C# class library project. Rename the project "TextFunctions" and the default class file "RegularExpressions". For our first functions, will will only be using two namespaces. System.Text.RegularExpressions is required for regular expressions, and Microsoft.SqlServer.Server is the basic namespace for creating SQL Server CLR code. Visual Studio creates a code file that should look like Figure 1.

    1 using Microsoft.SqlServer.Server;

    2 using System.Text.RegularExpressions;

    3 

    4 namespace TextFunctions

    5 {

    6     public class RegularExpressions

    7     {

    8 

    9     }

   10 }

Figure 1.

In my experience, the "TextFunctions" namespace declaration in the C# source code will cause problems later on when we try use then functions in our assembly within SQL Server. Delete this part of the code.

Now we will create our first function. To create functions for SQL Server, the C# compiler needs to know what we are up to. Functions destined to become SQL Server functions must therefore be marked with the code attribute

[Microsoft.SqlServer.Server.SqlFunction()]

The actual code for our first C# function, called RegExmatch(), is very simple. We instantiate the Regex class, using the pattern expression as an argument to the RegEx constructor. This is the regular expression that ultimately will be passed as an argument by the code calling our function. If the expression contains a syntax error, a runtime error will be thrown. At present, however, we are not including any error handling. After instantiating the Regex object, we call the IsMatch method providing the user's input string as an argument. The return value of IsMatch() is sent back to the calling code. Your code should look something like that in Figure 2.

    7         [Microsoft.SqlServer.Server.SqlFunction()]

    8         public static bool RegExMatch(string Input, string Pattern)

    9         {

   10             System.Text.RegularExpressions.

   11             Regex RegexInstance = new Regex(Pattern);

   12             return RegexInstance.IsMatch(Input);

   13         }

Figure 2.

The second function, RegExReplace(), does not introduce any new concepts. Here is the code:

   15 [Microsoft.SqlServer.Server.SqlFunction()]

   16         public static string RegExReplace(string Input, string Pattern, string Replacement)

   17         {

   18             Regex RegexInstance = new Regex(Pattern);

   19             return RegexInstance.Replace(Input, Replacement);

   20         }

Figure 3.

It is worth mentioning, however, that we got off easy this time. The arguments to both functions are strings, and the return types are bool and string. These data types map neatly onto SQL data types. Since our code will be called by TSQL code, we must be able to map a correspondence between the SQL type and the .NET type. Not all types are so cooperative as string.

The completed class should look like this:

    1 using Microsoft.SqlServer.Server;

    2 using System.Text.RegularExpressions;

    3 

    4 

    5     public class RegularExpressions

    6     {

    7         [Microsoft.SqlServer.Server.SqlFunction()]

    8         public static bool RegExMatch(string Input, string Pattern)

    9         {

   10             System.Text.RegularExpressions.

   11             Regex RegexInstance = new Regex(Pattern);

   12             return RegexInstance.IsMatch(Input);

   13         }

   14         [Microsoft.SqlServer.Server.SqlFunction()]

   15         public static string RegExReplace(string Input, string Pattern, string Replacement)

   16         {

   17             Regex RegexInstance = new Regex(Pattern);

   18             return RegexInstance.Replace(Input, Replacement);

   19         }

   20     }

   21 

Figure 4.

You should be able to compile the code with only the usual typographical errors to correct. The libraries required by this project included in the compilation by default, even if you choose to do a commandline compilation rather than use the Visual Studio.

Now comes the fun...

Loading a .NET Assembly into SQL Server

Before we can run our new functions, we must load the assembly into SQL Server and define the correspondence between a TSQL function name and a CLR function we have loaded. You will need to have administrative priviledge to set load and run CLR code on SQL Server.

Configuring the Server

SQL Server 2005 is not configured to use CLR code by default. You will have to explicitly configure the server to use the CLR. This is easily done in TSQL by calling the system procedure sp_configure.

sp_configure'clr enabled', 1
RECONFIGURE WITH OVERRIDE

(You are probably already aware that you can run sp_configure with only a partial completion of the first argument, as long as what you have typed is sufficient to uniquely identify the configuration property.)

Now we use the CREATE ASSEMBLY statement to tell SQL Server to copy the assembly into its system tables. Be sure that Management Studio query window is pointing to the database you want (e.g. Northwind) and run the following TSQL:

CREATE ASSEMBLY regX
FROM 'C:\KnowledgeBase\SqlStuff\SQL2005\CLR\UDF\TextFunctions\TextFunctions\bin\Debug\RegEx.dll'

It would be a remarkable coincidence indeed if your path were the same as this one; you might wish to change it.

After the CREATE ASSEMBLY statement is run, the executable code is safely stored in SQL Server's own system tables, away from access by any hacker. If you like, you can confirm this by deleting the dll file. The code will run from SQL Server anyway! If you wish to recompile the .NET assembly and reinstall it into SQL Server, you must either drop it or change the CREATE to ALTER, just as with TSQL statements.

The final step for the installation of our CLR code is telling SQL Server how to match up a Transact SQL request with a CLR function. We do this with a CREATE FUNCTION statement (or CREATE PROCEDURE as appropriate). However, unlike the usual CREATE FUNCTION statement, there is no TSQL code. There is only the EXTERNAL NAME reference to the function. Note that the function name is fully qualified, that is, assemblyName.ClassName.FunctionName. It is also important to be aware that the EXTERNAL NAME specification is case-sensitive!

-- The external name is case sensitive
CREATE Function RegExMatch(@Input NVARCHAR(512),@Pattern NVARCHAR(127))
RETURNS BIT EXTERNAL NAME regX.RegularExpressions.RegExMatch
GO
CREATE Function RegExReplace(@Input NVARCHAR(512),@Pattern NVARCHAR(127), @Replacement NVARCHAR(512))
RETURNS NVARCHAR(512) EXTERNAL NAME regX.RegularExpressions.RegExReplace

Note that the assembly name in the CREATE FUNCTION statement is the name you gave it when you loaded the assembly into SQL Server, not the name of the dll file, which is no longer of any concern to SQL Server. The TSQL function name need not be the same as the CLR function, but I believe it is less confusing if they are the same. My choice of character size for the NVARCHAR declarations is arbitrary; you might feel some other size is more appropriate. Note that the RegExMatch function returns type BIT.

The time has come to test our creation. I suspect you are anxious to get on with it, and have no patience for a regular expression tutorial. So we'll make testing the functions our intro.

Here are few SELECT statements which operate on literal strings:

SELECT dbo.RegExMatch('Fred','.d')
SELECT dbo.RegExMatch('Fred','^.d')
SELECT dbo.RegExMatch('Fred','\d')
SELECT dbo.RegExMatch('Fr3d','\d')

In the first example, '.d' means the letter 'd' after any character. The period matches any character. The caret in the second example matches the beginning of a line of text. '^.d' therefore only matches 'd' if it is the second letter. In the next two examples, the 'd' is preceded by a backslash. This makes it one of the many shortcuts in regular expressions. '\d' matches any digit, i.e. 0-9. '\D' matches anything which is not a digit.

Here are some query examples which use our RegExMatch() function in a WHERE clause to extract matching row data from the Northwind Customers table:

SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '^.{1,3}n' )=1
-- finds CompanyName where the second, third, or fourth letter is an 'n'

SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sa' )=1
-- find lower-case letter a after a whitespace character (space, tab, etc)

SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sA' )=1
-- find upper-case letter A after a whitespace character

SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '(cat|kat)' )=1
-- find either 'cat' or kat'

SELECT * FROM Customers WHERE dbo.RegExMatch(Address, '\d' )=0
-- find addresses which do not contain any numerals at all

Now let's have a look at RegExReplace. We will pick a slightly more complex example. We would like to query phone numbers from the Customers table of the Northwind sample database, but we do not want to see either parentheses or hyphens. Parentheses and hyphens, however, are special charcters in regular expressions. We will have to escape them with backslashes.

(\(|\)|\-)

The vertical pipe symbol represents OR in regular expression logic. This peculiar-looking expression can be read \( OR \) OR \-. If you prefer, an open paren OR a close paren OR a hyphen. The entire OR series is enclosed in parentheses. Now we can write our query:

SELECT CompanyName, Phone, dbo.RegExReplace(Phone,'(\(|\)|\-)' ,'' )
FROM Customers
WHERE Phone IS NOT NULL

The WHERE clause in this query is necessary since the Regex object will choke on a NULL. Throwing an exception for an improperly formed regular expression is OK, since it is a problem the user will need to fix. But a well-designed function should test for nulls rather than throw an exception. That's your new assignment!

Source Code

The source code for all of the examples above can be found here.

 

About Dan Buskirk:

 Dan Buskirk had been a research scientist for many years when he left the university labs to participate in a startup venture to design and build databases for medical science. Since that time he has managed a consulting practice specializing in database design and development. Dan balances this with training on Microsoft SQL Server, Microsoft Analysis Services, and .NET programming. His interests include mathematical methods for data mining and computing methods for advanced data categorization on Windows clusters.


Want to learn more about .NET Training?


0 Comments

COMMENTS

Name:
URL:
Comment:

Comments are disabled for this article.