3

I have data like 'John is my name; Ram is my name; Adam is my name'.

My rule is such that every first letter that comes after ; should be capital letter.

How do I select all values that satisfy the rule?

Shnugo
  • 66,100
  • 9
  • 53
  • 114
bor
  • 2,211
  • 4
  • 22
  • 37
  • Which version of SQL Server? – Shnugo Jun 04 '17 at 10:46
  • @Shnugo Microsoft SQL Server 2012 - 11.0.5058.0 (X64) – bor Jun 04 '17 at 10:50
  • 1
    This is going to be an ugly problem, especially if the number of semicolon-separated terms is unknown. A better solution would be to normalize your data and get each name/sentence on a separate record. – Tim Biegeleisen Jun 04 '17 at 10:53
  • @TimBiegeleisen can't there be a regular expression solution? – bor Jun 04 '17 at 11:04
  • 2
    Sadly, SQL Server is known for having poor native regex support. But this would be easy to handle using something like C# or Java. Have you considered such an alternative? – Tim Biegeleisen Jun 04 '17 at 11:05

5 Answers5

2

You might split this with the XML trick like this

DECLARE @YourString VARCHAR(100)='John is my name; Ram is my name; Adam is my name';
WITH Splitted AS
(
    SELECT CAST('<x>' + REPLACE((SELECT REPLACE(@YourString,'; ','$$SplitHere$$') AS [*] FOR XML PATH('')),'$$SplitHere$$','</x><x>')+ '</x>' AS XML) AS Casted
)
,DerivedTable AS
(
    SELECT  ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNr
           ,x.value(N'text()[1]',N'nvarchar(max)') AS Part
    FROM Splitted
    CROSS APPLY Casted.nodes(N'/x') AS X(x)
)
SELECT PartNr
      ,Part
      ,CASE WHEN ASCII(LEFT(Part,1)) BETWEEN ASCII('A') AND ASCII('Z') THEN 1 ELSE 0 END AS FirstIsCapital
FROM DerivedTable;

The result

Nr  Part            FirstLetterIsCaptial
---------------------------------------- 
1   John is my name     1
2   Ram is my name      1
3   Adam is my name     1

I do not know what your final goal is... Find parts, where the first letter is not capital? Make sure, that your rule is fullfilled?

However:
Best was, to use this to correct your design and place these parts in a 1:n related side table.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Use following source string for testing purpose: `'John is my name; Ram is my name; Adam is my name'` – Bogdan Sahlean Jun 04 '17 at 12:08
  • @BogdanSahlean , Well I'd use `L/RTRIM()` ... The error is the storage format ... Any code to solve this will be a hack... – Shnugo Jun 04 '17 at 12:11
  • `TRIM` was introduced in SQL Server 2017. What if source string is `'!John is my name;!Ram is my name;!Adam is my name;¡Hola!'` ? – Bogdan Sahlean Jun 04 '17 at 12:13
2

The other answers show how to transform rows into something that matches your pattern.

If you just want to select the rows that match the pattern you are describing, you can use patindex() or like with a case sensitive collation (or use collate to apply one).

This assumes that in addition to the rule that every letter that follows a semicolon must be a capital letter, that the very first letter should also be capital. If that is not the case, just remove the first clause in the where.

select *
from t
where patindex('[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%', val collate latin1_general_cs_as) = 1
  and patindex('%; [^ABCDEFGHIJKLMNOPQRSTUVWXYZ]%', val collate latin1_general_cs_as) = 0

select *
from t
where val collate latin1_general_cs_as like '[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%'
  and val collate latin1_general_cs_as not like '%; [^ABCDEFGHIJKLMNOPQRSTUVWXYZ]%'

test setup:

create table t (id int not null identity(1,1),val varchar(256))
insert into t values 
 ('John is my name; Ram is my name; Adam is my name')
,('john is my name; ram is my name; adam is my name')

rextester demo: http://rextester.com/DBGIS10645

Both of the above return:

+----+--------------------------------------------------+
| id |                       val                        |
+----+--------------------------------------------------+
|  1 | John is my name; Ram is my name; Adam is my name |
+----+--------------------------------------------------+
SqlZim
  • 37,248
  • 6
  • 41
  • 59
1

Little bit of ugly solution but you can give a try...

Declare @str nvarchar(max) = 'John is my name; Ram is my name; Adam is my name'

Declare @xml as xml
Set @xml = cast(('<X>'+replace(@str,';' ,'</X><X>')+'</X>') as xml)
Select * from (
    Select RowN = Row_Number() over (order by (SELECT NULL)), LTrim(RTrim(N.value('.', 'nvarchar(MAX)'))) as value FROM @xml.nodes('X') as T(N) -- this is to split if you are using sql server 2016 you can use string_Split
) a
Where unicode(substring(a.[value],1,1)) = unicode(upper(substring(a.[value],1,1)))

Idea is to split the string and check with unicode value to see whether it is upper or not

Kannan Kandasamy
  • 13,405
  • 3
  • 25
  • 38
1

you can create a function like this.

Create FUNCTION SPLITTER (    
    @textData NVARCHAR(MAX),
    @Delimeter NVARCHAR(MAX) ) RETURNS @RtnValue TABLE  (
    Data NVARCHAR(MAX) )  AS BEGIN 
    DECLARE @index INT  DECLARE @data nvarchar(1000)    DECLARE @firstCharacter char
    SET @index = CHARINDEX(@Delimeter,@textData)

    WHILE (@index>0)
    BEGIN
                set @data = LTRIM(RTRIM(SUBSTRING(@textData, 1, @index - 1)))       set @firstCharacter = SUBSTRING(@data,1,1);
                if UNICODE(@firstCharacter) = UNICODE(upper(@firstCharacter))       begin           INSERT INTO @RtnValue (data) SELECT @data       end;

        SET @textData = SUBSTRING(@textData, @index + DATALENGTH(@Delimeter) / 2, LEN(@textData))

        SET @index = CHARINDEX(@Delimeter, @textData)
    END
        set @data = @textData   set @firstCharacter = SUBSTRING(@data,1,1);
            if UNICODE(@firstCharacter) = UNICODE(upper(@firstCharacter))   begin       INSERT INTO @RtnValue (data) SELECT @data   end;

    RETURN END

use it like this

select * from SPLITTER ('John is my name; Ram is my name; Adam is my name', ';')

h.munawar
  • 56
  • 4
1

You can grab a copy of NGrams8K and do this:

-- note that I made the 3rd item start with lower-case
DECLARE @YourString VARCHAR(100)='John is my name; Ram is my name; adam is my name';

WITH D(n) AS
(
  SELECT 0 UNION ALL SELECT position 
  FROM dbo.NGrams8k(@yourstring,1) WHERE token = ';'
),
TOKEN(token) AS
(
  SELECT LTRIM(SUBSTRING(@YourString, N+1, 
           ISNULL(NULLIF(CHARINDEX(';', @YourString, N+1),0), 101)-(N+1)))
  FROM D
)
SELECT token,
       FirstLetterIsCaptial = IIF(ASCII(SUBSTRING(token,1,1)) BETWEEN 65 AND 90, 1, 0)
FROM TOKEN;

Results

token              FirstLetterIsCaptial
------------------ --------------------
John is my name    1
Ram is my name     1
adam is my name    0
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18