140

Is there any regular expression library written in T-SQL (no CLR, no extended SP, pure T-SQL) for SQL Server, and that should work with shared hosting?

Edit:

  • Thanks, I know about PATINDEX, LIKE, xp_ sps and CLR solutions
  • I also know it is not the best place for regex, the question is theoretical :)
  • Reduced functionality is also accepted
CarenRose
  • 1,266
  • 1
  • 12
  • 24
xnagyg
  • 4,784
  • 2
  • 32
  • 24
  • 2
    I too have this question. I know that a database is not the best place to have this, but the reality is that other solutions require SQL admin permissions to reconfigure the server. Unfortunately, some of our clients will not choose to enable CLR, etc, and we are stuck to database-only solutions. – Paul Draper Mar 13 '13 at 05:29
  • @PaulDraper and xnagyg: why rule out SQLCLR? It is the most appropriate means of getting Regular Expressions in queries. And why would some of your clients choose to not enable CLR? I have yet to come across a _valid_ reason. Sure, I hear "security" and "performance", but those are bogus reasons that are a result of not understanding how SQLCLR works and how it can be restricted. – Solomon Rutzky Jun 04 '15 at 23:52
  • 3
    @srutzky: most shared hosting provider does not allow CLR. You should ask them about "security" and "performance":) – xnagyg Jun 05 '15 at 06:02
  • @xnagyg Sure, I can ask a few. However, pointing to the behavior of a group does not in any way address the question of "is there a _valid_ reason" for that behavior. It could just as easily be that all of those shared hosting providers set their policy based on the same misunderstanding. And, if nothing else, the simple fact that not _all_ of them disallow SQLCLR actually supports the idea of there _not_ being a problem more than the idea of there being a problem since if those problems did exist, the providers that allow SQLCLR would be experiencing those problems and would stop allowing it. – Solomon Rutzky Jun 05 '15 at 14:49
  • @xnagyg Also, I should clarify that I am speaking in terms of Assemblies marked as `SAFE` and not marked as either `EXTERNAL_ACCESS` or `UNSAFE` (as I do understand why those 2 latter Permission Sets would be problematic for a shared hosting environment). Microsoft Azure SQL Database V12 (i.e. the new version as of late 2014), which is a shared environment, allows for Assemblies marked as `SAFE` (and loaded via `FROM 0x...` instead of from a DLL since you can't upload a DLL). But `SAFE` is all that is needed for Regular Expressions and LOTS of other very useful functions. – Solomon Rutzky Jun 05 '15 at 15:43
  • @srutzky: unfortunatelly I did not find any shared hosting with CLR support at that time (in 2011). I managed to use full text search instead. – xnagyg Jun 05 '15 at 19:46
  • Just in case someone is still looking for a solution, here's another option: https://www.red-gate.com/simple-talk/sql/t-sql-programming/tsql-regular-expression-workbench, which uses the VBScript.RegExp procedure via OLE Automation from T-SQL – Reversed Engineer Oct 15 '19 at 11:11
  • (similar to [James Poulose's answer](https://stackoverflow.com/a/12903070/303101) below) – Reversed Engineer Oct 15 '19 at 11:26

6 Answers6

85

How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).
Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
Eric Z Beard
  • 37,669
  • 27
  • 100
  • 145
21

If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.

// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

namespace CLR_Functions
{   
    public class myFunctions
    {
        [SqlFunction]
        public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
        {            
            SqlInt16 returnVal = 0;
            try
            {
                string myText = text.ToString();
                string myPattern = pattern.ToString();
                MatchCollection mc = Regex.Matches(myText, myPattern);
                if (mc.Count > 0)
                {
                    returnVal = 1;
                }
            }
            catch
            {
                returnVal = 0;
            }

            return returnVal;
        }
    }
}

In management studio import the dll file via programability -- assemblies -- new assembly

Then run this query:

CREATE FUNCTION RegexContain(@text NVARCHAR(50), @pattern NVARCHAR(50))
RETURNS smallint 
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain

Then you should have complete access to the function via the database you stored the assembly in.

Then use in queries like so:

SELECT * 
FROM 
(
    SELECT
        DailyLog.Date,
        DailyLog.Researcher,
        DailyLog.team,
        DailyLog.field,
        DailyLog.EntityID,
        DailyLog.[From],
        DailyLog.[To],
        dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
    FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0
rufanov
  • 3,266
  • 1
  • 23
  • 41
Matt Farguson
  • 325
  • 2
  • 6
16

There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.

Steven Murawski
  • 10,959
  • 41
  • 53
5

In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.

John Fisher
  • 22,355
  • 2
  • 39
  • 64
  • 4
    Once again, i'ts a CLR solution - not what the OP asked for – Reversed Engineer Jul 20 '15 at 15:29
  • 13
    @DaveBoltman: He asked the question in 2008. People search for this sometimes and run across this question without wanting to avoid CLR. This helped me and might help them. – John Fisher Jul 20 '15 at 17:16
  • Sure, I do agree with you @JohnFisher - it **is** a helpful answer for someone using CLR. But in 2015, we'd still like a SQL only solution in our SQL project (no CLR) for various reasons, just like the OP did in 2008. The year doesn't matter :) E.g. the battery in your car was [released in 1859](https://en.wikipedia.org/wiki/Lead–acid_battery). But you'd still like to avoid using more modern batteries such as NiMH batteries released more than 100 years later, for various reasons (such as being able to afford a car at all :) – Reversed Engineer Jul 28 '15 at 08:00
  • 2
    @DaveBoltman: You missed the part where "People search for this sometimes and run across this question without wanting to avoid CLR". It was the key point. – John Fisher Jul 28 '15 at 16:37
  • sure - you're right @JohnFisher, you did say that. Glad it helped you, and I'm sure it will help others too – Reversed Engineer Jul 29 '15 at 11:49
5

If you are using SQL Server 2016 or above, you can use sp_execute_external_script along with R. It has functions for Regular Expression searches, such as grep and grepl.

Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People] table in the [WideWorldImporters] sample database. They get passed to the R engine as a dataframe named InputDataSet. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.

EXEC sp_execute_external_script 
 @language = N'R',
 @script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
 @input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
 WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))

Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".

Closing Thoughts Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.

Dave Mason
  • 4,746
  • 2
  • 23
  • 24
2

You can use VBScript regular expression features using OLE Automation. This is way better than the overhead of creating and maintaining an assembly. Please make sure you go through the comments section to get a better modified version of the main one.

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

DECLARE @obj INT, @res INT, @match BIT;
DECLARE @pattern varchar(255) = '<your regex pattern goes here>';
DECLARE @matchstring varchar(8000) = '<string to search goes here>';
SET @match = 0;

-- Create a VB script component object
EXEC @res = sp_OACreate 'VBScript.RegExp', @obj OUT;

-- Apply/set the pattern to the RegEx object
EXEC @res = sp_OASetProperty @obj, 'Pattern', @pattern;

-- Set any other settings/properties here
EXEC @res = sp_OASetProperty @obj, 'IgnoreCase', 1;

-- Call the method 'Test' to find a match
EXEC @res = sp_OAMethod @obj, 'Test', @match OUT, @matchstring;

-- Don't forget to clean-up
EXEC @res = sp_OADestroy @obj;

If you get SQL Server blocked access to procedure 'sys.sp_OACreate'... error, use sp_reconfigure to enable Ole Automation Procedures. (Yes, unfortunately that is a server level change!)

More information about the Test method is available here

Happy coding

James Poulose
  • 3,569
  • 2
  • 34
  • 39
  • sry, i know this is old, BUT: Why is VBScript through OLE "way better" than CLR? If you ONLY think about maintainance, you COULD be right, BUT what about performance? – swe Jul 19 '16 at 15:51
  • 1
    @swe By 'way better', i was referring to the time saved due to the overhead of creating and maintaining a .NET assembly just for this purpose. – James Poulose Jul 19 '16 at 19:47