13

I am trying to port a query from MySQL to SQL SERVER 2012.

How do i write an equivalent for MySQL's substring_index()?

MySQL SUBSTRING_INDEX() returns the substring from the given string before a specified number of occurrences of a delimiter.

SUBSTRING_INDEX(str, delim, count)

SELECT SUBSTRING_INDEX('www.somewebsite.com','.',2);

Output: 'www.somewebsite'

Ankit
  • 2,448
  • 3
  • 20
  • 33
  • 2
    there isn't a direct equivalent . you will need to write your own. – Mitch Wheat May 25 '14 at 11:22
  • Found an implelemtation [here](http://www.sqlservercentral.com/Forums/Topic1445667-392-1.aspx). – Ankit May 25 '14 at 11:38
  • 1
    See https://stackoverflow.com/questions/23854724/sql-server-equivalent-of-substring-index-function-in-mysql for recent versions of SQL Server. – O. Jones Aug 03 '23 at 03:05
  • @O.Jones Help me out here, you linked to... _this question_? – Stuck at 1337 Aug 08 '23 at 13:30
  • Oh, sorry, I'm a dope, here's the link to the super-slick way of emulating SUBSTRING_INDEX with some JSON hackery. https://stackoverflow.com/questions/76824564/computed-column-on-split-value/76824639 – O. Jones Aug 08 '23 at 13:37

9 Answers9

8

Try this solution based on T-SQL and XQuery((root/row)[position() <= sql:variable("@count")]):

T-SQL Scalar function:

CREATE FUNCTION dbo.SUBSTRING_INDEX
(
    @str NVARCHAR(4000),
    @delim NVARCHAR(1),
    @count INT
)
RETURNS NVARCHAR(4000)
WITH SCHEMABINDING
BEGIN
    DECLARE @XmlSourceString XML;
    SET @XmlSourceString = (SELECT N'<root><row>' + REPLACE( (SELECT @str AS '*' FOR XML PATH('')) , @delim, N'</row><row>' ) + N'</row></root>');

    RETURN STUFF
    (
        ((
            SELECT  @delim + x.XmlCol.value(N'(text())[1]', N'NVARCHAR(4000)') AS '*'
            FROM    @XmlSourceString.nodes(N'(root/row)[position() <= sql:variable("@count")]') x(XmlCol)
            FOR XML PATH(N''), TYPE
        ).value(N'.', N'NVARCHAR(4000)')), 
        1, 1, N''
    );
END
GO

SELECT dbo.SUBSTRING_INDEX(N'www.somewebsite.com', N'.', 2) AS Result;

Output:

/*
Result
---------------
www.somewebsite
*/

or

TSQL Inline Table-Valued Function:

CREATE FUNCTION dbo.SUBSTRING_INDEX
(
    @str NVARCHAR(4000),
    @delim NVARCHAR(1),
    @count INT
)
RETURNS TABLE
AS 
RETURN
    WITH Base
    AS 
    (
        SELECT XmlSourceString = CONVERT(XML, (SELECT N'<root><row>' + REPLACE( (SELECT @str AS '*' FOR XML PATH('')) , @delim, N'</row><row>' ) + N'</row></root>'))
    )   
    SELECT STUFF
    (
        ((
            SELECT  @delim + x.XmlCol.value(N'(text())[1]', N'NVARCHAR(4000)') AS '*'
            FROM    Base b 
            CROSS APPLY b.XmlSourceString.nodes(N'(root/row)[position() <= sql:variable("@count")]') x(XmlCol)
            FOR XML PATH(N''), TYPE
        ).value(N'.', N'NVARCHAR(4000)')), 
        1, 1, N''
    ) AS Result;
GO

SELECT  *
FROM    (
    SELECT N'www.somewebsite.com' UNION ALL 
    SELECT N'www.yahoo.com' UNION ALL 
    SELECT N'www.outlook.com'
) a(Value)
CROSS APPLY dbo.SUBSTRING_INDEX(a.Value, N'.', 2) b;

Output:

/*
Value               Result
------------------- ---------------
www.somewebsite.com www.somewebsite
www.yahoo.com       www.yahoo
www.outlook.com     www.outlook
*/
Bogdan Sahlean
  • 19,233
  • 3
  • 42
  • 57
  • 2
    This solution is incomplete as it does not support the @count being negative (which searches from the end of the string instead of the beginning). – Scott Gartner Sep 02 '16 at 23:38
  • 1
    @ScottGartner: Use REVERSE twice. – Bogdan Sahlean Sep 03 '16 at 05:58
  • 1
    Actually, you would have to reverse the str, but you would also have to reverse the delim (though your implementation, unlike the MySql function, only allows the delim to be a single character). If you run the set of tests from my solution, you'll see that your function only passes five out of the 15 tests, and one of those is accidental (test number 9 happens to have the first letter match, finding the right substring). – Scott Gartner Sep 04 '16 at 19:40
  • SELECT reverse(dbo.SUBSTRING_INDEX(reverse('www.mysql.com'), '.', 2)); – farhang67 Jan 27 '17 at 14:42
7

I needed this recently, so I wrote the following stored function. At the end are a bunch of tests to make sure it operates exactly as the MySql function does (the expected results were copied from MySql after running the same tests there):

-- Function to reproduce the useful functionality of SUBSTRING_INDEX from MySql
CREATE FUNCTION dbo.SUBSTRING_INDEX(@InString  NVARCHAR(Max),
                                    @Delimiter NVARCHAR(Max),
                                    @Count     INT)
RETURNS NVARCHAR(200)
AS
BEGIN
    DECLARE @Pos INT;
    DECLARE @DelimiterOffsets TABLE
    (
         i      INT IDENTITY(1, 1) NOT NULL,
         offset INT NOT NULL
    );

    -- If @Count is zero, we return '' as per spec
    IF @Count = 0
    BEGIN
        RETURN '';
    END;

    DECLARE @OrigLength      INT = LEN(@InString);
    DECLARE @DelimiterLength INT = LEN(@Delimiter);

    -- Prime the pump.
    SET @Pos = Charindex(@Delimiter, @InString, 1);

    -- If the delimiter does not exist in @InString, return the whole string
    IF @Pos = 0
    BEGIN
        RETURN @InString;
    END;

    -- Put all delimiter offsets into @DelimiterOffsets, they get numbered automatically.
    DECLARE @CurrentOffset INT = 0;
    WHILE @Pos > 0
    BEGIN
        SET @CurrentOffset = @Pos;

        INSERT INTO @DelimiterOffsets
                    (offset)
             VALUES (@CurrentOffset);

        SET @Pos = Charindex(@Delimiter, @InString, @CurrentOffset + @DelimiterLength);
    END;

    -- This number is guaranteed to be > 0.
    DECLARE @DelimitersFound INT = (SELECT Count(*) FROM @DelimiterOffsets);

    -- If they requested more delimiters than were found, return the whole string, as per spec.
    IF Abs(@Count) > @DelimitersFound
    BEGIN
        RETURN @InString;
    END;

    DECLARE @StartSubstring INT = 0;
    DECLARE @EndSubstring   INT = @OrigLength;

    -- OK, now return the part they requested
    IF @Count > 0
    BEGIN
        SET @EndSubstring = (SELECT offset 
                               FROM @DelimiterOffsets 
                              WHERE i = @Count);
    END
    ELSE
    BEGIN
        SET @StartSubstring = (SELECT offset + @DelimiterLength 
                                 FROM @DelimiterOffsets 
                                WHERE i = (@DelimitersFound + @Count + 1));
    END;

    RETURN Substring(@InString, @StartSubstring, @EndSubstring);
END; 

Go 

GRANT EXECUTE ON [dbo].SUBSTRING_INDEX TO PUBLIC;

-- Tests
DECLARE @TestResults TABLE (i int, answer nVarChar(MAX), expected nVarChar(MAX));

insert into @TestResults
select * from  
(
    (SELECT  1 as i, [dbo].SUBSTRING_INDEX(N'www.somewebsite.com', N'.', 2)    as r, 'www.somewebsite'     as e) UNION
    (SELECT  2 as i, [dbo].SUBSTRING_INDEX(N'www.yahoo.com', N'.', 2)          as r, 'www.yahoo'           as e) UNION
    (SELECT  3 as i, [dbo].SUBSTRING_INDEX(N'www.outlook.com', N'.', 2)        as r, 'www.outlook'         as e) UNION
    (SELECT  4 as i, [dbo].SUBSTRING_INDEX(N'www.somewebsite.com', N'.', -2)   as r, 'somewebsite.com'     as e) UNION
    (SELECT  5 as i, [dbo].SUBSTRING_INDEX(N'www.yahoo.com', N'.', -2)         as r, 'yahoo.com'           as e) UNION
    (SELECT  6 as i, [dbo].SUBSTRING_INDEX(N'www.outlook.com', N'.', -2)       as r, 'outlook.com'         as e) UNION
    (select  7 as i, [dbo].SUBSTRING_INDEX('hi.you.com','.',2)                 as r, 'hi.you'              as e) UNION
    (select  8 as i, [dbo].SUBSTRING_INDEX('hi.you.com','.',-1)                as r, 'com'                 as e) UNION
    (select  9 as i, [dbo].SUBSTRING_INDEX(N'prueba','ue',1)                   as r, 'pr'                  as e) UNION
    (select 10 as i, [dbo].SUBSTRING_INDEX(N'prueba','ue',-1)                  as r, 'ba'                  as e) UNION
    (select 11 as i, [dbo].SUBSTRING_INDEX(N'prueba','ue',0)                   as r, ''                    as e) UNION
    (SELECT 12 as i, [dbo].SUBSTRING_INDEX(N'wwwxxxoutlookxxxcom', N'xxx', 2)  as r, 'wwwxxxoutlook'       as e) UNION
    (SELECT 13 as i, [dbo].SUBSTRING_INDEX(N'wwwxxxoutlookxxxcom', N'xxx', -2) as r, 'outlookxxxcom'       as e) UNION
    (SELECT 14 as i, [dbo].SUBSTRING_INDEX(N'wwwxxxoutlookxxxcom', N'xxx', 5)  as r, 'wwwxxxoutlookxxxcom' as e) UNION
    (SELECT 15 as i, [dbo].SUBSTRING_INDEX(N'wwwxxxoutlookxxxcom', N'xxx', -5) as r, 'wwwxxxoutlookxxxcom' as e)
) as results;

select tr.i,
       tr.answer,
       tr.expected,
       CASE WHEN tr.answer = tr.expected THEN 'Test Succeeded' ELSE 'Test Failed' END testState
  from @TestResults tr
 order by i;

Here's a version inspired by Bogdan Sahlean's answer using SQL Server's XML functionality to do the parsing and combining:

CREATE FUNCTION dbo.SUBSTRING_INDEX(@InString  NVARCHAR(Max),
                                    @Delimiter NVARCHAR(Max),
                                    @Count     INT)
RETURNS NVARCHAR(200)
AS
BEGIN
    -- If @Count is zero, we return '' as per spec
    IF @Count = 0
    BEGIN
        RETURN '';
    END;

    -- First we let the XML parser break up the string by @Delimiter.
    -- Each parsed value will be <piece>[text]</piece>.
    DECLARE @XmlSourceString XML = (select N'<piece>' + REPLACE( (SELECT @InString AS '*' FOR XML PATH('')) , @Delimiter, N'</piece><piece>' ) + N'</piece>');

    -- This will contain the final requested string.
    DECLARE @Results nVarChar(MAX);

    ;WITH Pieces(RowNumber, Piece) as
    (
        -- Take each node in @XmlSourceString, and return it with row numbers
        -- which will identify each piece and give us a handle to change the
        -- order, depending on the direction of search.
        SELECT  row_number() over(order by x.XmlCol) as RowNumber,
                @Delimiter + x.XmlCol.value(N'(text())[1]', N'NVARCHAR(MAX)') AS '*'
          FROM  @XmlSourceString.nodes(N'(piece)') x(XmlCol)
    ), orderedPieces(RowNumber, Piece) as
    (
        -- Order the pieces normally or reversed depending on whether they want
        -- the first @Count pieces or the last @Count pieces.
        select TOP (ABS(@Count)) 
               RowNumber, 
               Piece
          from Pieces
         ORDER BY CASE WHEN @Count < 0 THEN RowNumber END DESC ,
                  CASE WHEN @Count > 0 THEN RowNumber END ASC
    ), combinedPieces(result) as
    (
        -- Now combine the pieces back together, ordering them by
        -- the original order.  There will always
        -- be an extra @Delimiter on the front of the string.
        select CAST(Piece AS VARCHAR(100))
          from OrderedPieces
         order by RowNumber
           FOR XML PATH(N'')
    )
    -- Finally, strip off the extra delimiter using STUFF and store the string in @Results.
    select @Results = STUFF(result, 1, LEN(@Delimiter), '') from combinedPieces;

    return @Results;
END;

Running the tests produces this:

i  answer              expected             testState
1  www.somewebsite     www.somewebsite      Test Succeeded
2  www.yahoo           www.yahoo            Test Succeeded
3  www.outlook         www.outlook          Test Succeeded
4  somewebsite.com     somewebsite.com      Test Succeeded
5  yahoo.com           yahoo.com            Test Succeeded
6  outlook.com         outlook.com          Test Succeeded
7  hi.you              hi.you               Test Succeeded
8  com                 com                  Test Succeeded
9  pr                  pr                   Test Succeeded
10 ba                  ba                   Test Succeeded
11                                          Test Succeeded
12 wwwxxxoutlook       wwwxxxoutlook        Test Succeeded
13 outlookxxxcom       outlookxxxcom        Test Succeeded
14 wwwxxxoutlookxxxcom wwwxxxoutlookxxxcom  Test Succeeded
15 wwwxxxoutlookxxxcom wwwxxxoutlookxxxcom  Test Succeeded
Community
  • 1
  • 1
Scott Gartner
  • 862
  • 14
  • 22
3

My best bet is:

select substring(email,(charindex('@', email,1)+1),100) from yourtable;

assuming that TLD.EXT max 100 characters. You can increase as you like.

M--
  • 25,431
  • 8
  • 61
  • 93
1
    CREATE FUNCTION FN_SUBSTRING_INDEX
    (
       @TEXTO NVARCHAR(200),
      @SUBSTRING_INDEX NVARCHAR(10),
       @DESPLAZAMIENTO INT
    )
    RETURNS NVARCHAR(200)
    AS

    BEGIN


        DECLARE @indiceSubstring INT
        DECLARE @RESULTADO NVARCHAR(200)
        SELECT @indiceSubstring = CHARINDEX(@SUBSTRING_INDEX,@TEXTO)

        IF @DESPLAZAMIENTO > 0
        BEGIN
            SELECT @RESULTADO=SUBSTRING(@TEXTO,@indiceSubstring+@DESPLAZAMIENTO+1,LEN(@TEXTO))
        END 
        ELSE
        BEGIN
            SELECT @RESULTADO=SUBSTRING(@TEXTO,0,@indiceSubstring-@DESPLAZAMIENTO-1)
        END 

    RETURN @RESULTADO
    END
    GO

to right select dbo.FN_SUBSTRING_INDEX(N'prueba','ue',1);

to left select dbo.FN_SUBSTRING_INDEX(N'prueba','ue',-1);

Gorson
  • 11
  • 1
0

Try This One.....

CREATE FUNCTION SubString_Index
(
   @ExistingString NVARCHAR(200),
   @BreakPoint NVARCHAR(10),
   @number INT
)
RETURNS NVARCHAR(200)
AS
BEGIN
DECLARE @Count INT
DECLARE @Substring NVARCHAR(200)
DECLARE @ssubstring NVARCHAR(200)
SET @ssubstring=@ExistingString
DECLARE @scount INT
SET @scount=0
DECLARE @sscount INT
SET @sscount=0
WHILE(@number>@scount)
    BEGIN
            Select @Count=CHARINDEX(@BreakPoint,@ExistingString)
            Select @ExistingString=SUBSTRING(@ExistingString,@Count+1,LEN(@ExistingString))
            Select @scount=@scount+1 
            select @sscount=@sscount+@Count
    END

SELECT @Substring=SUBSTRING(@ssubstring,0,@sscount)

RETURN @Substring
END
GO


select dbo.SubStringIndex('hi.you.com','.',1)
0

This microsoft sql function Works exactly like substring_index function in mysql

/** This microsoft sql function Works exactly like substring_index function in mysql **/
  
CREATE FUNCTION SubString_Index
(
   @ExistingString NVARCHAR(MAX),
   @BreakPoint NVARCHAR(MAX),
   @number INT
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Count INT
DECLARE @SubstringLength INT
DECLARE @Substring NVARCHAR(MAX)
DECLARE @ssubstring NVARCHAR(MAX)
SET @ssubstring=@ExistingString
DECLARE @scount INT
SET @scount=0
DECLARE @sscount INT
SET @sscount=0
DECLARE @number2 INT
DECLARE @occurence INT
SET @occurence=LEN(@ExistingString) - LEN(REPLACE(@ExistingString, @BreakPoint, ''))
If @number<0
     BEGIN
        SET @number2= @occurence-(-1*@number)+1
     END
If @number>0
     BEGIN
        SET @number2=@number
     END
WHILE(@number2>@scount)
    BEGIN
            Select @Count=CHARINDEX(@BreakPoint,@ExistingString)
            Select @SubstringLength=@Count+LEN(@BreakPoint) 
            Select @ExistingString=SUBSTRING(@ExistingString,@SubstringLength,LEN(@ExistingString)-@Count)
            Select @scount=@scount+1 
            select @sscount=@sscount+@Count
    END
If @number<0
     BEGIN
        if (@number = -1) and (@sscount+LEN(@BreakPoint)) = (LEN(@ssubstring)+1)
            BEGIN
               SELECT @Substring=''
            END
        else if @occurence = 0
            BEGIN
               SELECT @Substring=''
            END
        else
            BEGIN
               SELECT @Substring=SUBSTRING(@ssubstring, @sscount+LEN(@BreakPoint), LEN(@ssubstring))
            END
     END
If @number>0
    if @occurence = 0
            BEGIN
               SELECT @Substring=''
            END
        else
            BEGIN
               SELECT @Substring=SUBSTRING(@ssubstring,0,@sscount)
            END

RETURN @Substring
END
GO

  • Welcome to Stack Overflow! While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Suraj Rao Nov 21 '20 at 09:18
0

GETTING INDEX -2 WITH PYTHON

use my_guitar_shop;
go  


EXEC sp_execute_external_script 
@language =N'Python',
@script=N'

import pandas as pd
import numpy as np
x = np.array(InputDataSet["email_address"], dtype = str)
broken = np.char.split(x, sep = ".")
OutputDataSet = pd.DataFrame([".".join(e[-2:]) if len(e)>2 else ".".join(e) for e in broken])
', 
@input_data_1 = N'SELECT email_address  from customers;'
WITH RESULT SETS(([indexed_email] nvarchar(250)));
reivaJ
  • 11
  • 2
0

This is my approach to the MySQL function, case-sensitive and passing the @Scott Gartner tests:

CREATE Function Substring_Index(
      @String nvarchar(MAX)
    , @Delimiter nvarchar(MAX)
    , @Number int
) returns nvarchar(MAX) as
begin
    declare @myString nvarchar(MAX) =
                case when @Number < 0
                    then reverse(@String)
                    else         @String
                end
          , @myDelim nvarchar(MAX) =
                case when @Number < 0
                    then reverse(@Delimiter)
                    else         @Delimiter
                end
          , @szDelim int =
                dataLength(cast(@Delimiter as varchar(MAX)))
          , @Count int = 1
          , @LastFound int = -1
    while @Count <= abs(@Number) and @LastFound <> 0
        select @Count += 1
            ,  @LastFound  =
                    charindex(@myDelim
                            , @myString collate Latin1_General_BIN
                            , @LastFound + @szDelim)
    select @LastFound =
                case when @LastFound = 0
                    then datalength(cast(@myString as varchar(MAX)))
                    else abs(@LastFound) - 1
                end
        ,  @myString = left(@myString, @LastFound)
    return
        case when @Number < 0
            then reverse(@myString)
            else         @myString
        end
end

DATALENGHT() is used because LEN() forgets trailing spaces, and CAST to varchar to avoid double lengths.

Testing with select dbo.Substring_Index(N'AB CD EF GH',N' ', -2) results EF GH as expected.

Xabi
  • 465
  • 5
  • 8
-2

Visit https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16!

SELECT value FROM STRING_SPLIT('www.somewebsite.com', '.'); 
N.Mun'gau
  • 1
  • 2
  • 2
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Yunnosch May 01 '23 at 16:46
  • 1
    But, this just returns an arbitrarily-ordered _set_ of all the values in the string. How do they identify and return just the 2nd element? – Stuck at 1337 Aug 08 '23 at 13:26