1

How do I extract only the string after CX_EduDegree=??????? and replace the %20 with spaces. The data is separated by spaces, and is never in the same position.

I have tried to use the patindex with substring and replace. But I have had no success.

select top 5 clientid, extFields
from tblSYS_Clients
where ExtFields like '%CX_Edu%'
and ClientID in ('1633496','1633692','1453977','1657410','1584563','1655341','1632686','1352611','1484271','1361354') 
clientid    extFields
1352611 CX_CurrentJobStartDate=01/20/2001  CX_CurrentJobHours=40  CX_SupervisorName=Rhonda%20Kaiser  CX_EduDegree=BS%20in%20Nursing  CX_SupervisorPhone=970-495-8100
1361354 CX_CurrentJobStartDate=06/20/1997  CX_CurrentJobHours=30  CX_SupervisorName=Georgia%20Chapin  CX_SupervisorPhone=702-616-5632  CX_EduDegree=MS/MA%20%20in%20Nursing
1453977 CX_CurrentJobStartDate=08/20/1990  CX_CurrentJobHours=40  CX_SupervisorName=Jan%20Rasco  CX_SupervisorPhone=281-631-8789  CX_EduDegree=Diploma
1484271 CX_CurrentJobStartDate=01/01/2011  CX_CurrentJobHours=40  CX_SupervisorName=Kay%20Hix  CX_SupervisorPhone=317-329-7209  CX_EduDegree=AD%20in%20Nursing
1584563 CX_CurrentJobStartDate=11/26/2006  CX_CurrentJobHours=40  CX_SupervisorName=PHILLIP%20MOISUK  CX_SupervisorPhone=916-453-4545  CX_EduDegree=BS%20in%20Nursing

Results I want to see:

1633496 BS in Nursing 
1633692 BS in Nursing 
1453977 Diploma 
1657410 AD in Nursing 
1584563 BS in Nursing 
1655341 AD in Nursing 
1632686 BS in Nursing 
1352611 BS in Nursing 
1484271 AD in Nursing 
1361354 MS/MA in Nursing 
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Lee
  • 11
  • 2
  • are you using sql server? – Vamsi Prabhala May 09 '16 at 23:05
  • will `CX_EduDegree=..` always be at the end of the string? – Vamsi Prabhala May 09 '16 at 23:35
  • I am using sql server. No, the string CX_EduDegree= is never in the same place and could be at the beginning, middle, or the end. Its storing data from user entry, and if they don't enter something it falls earlier or later, and sometimes isn't even included. – Lee May 10 '16 at 15:22

4 Answers4

0

String manipulation is a bit cumbersome in SQL Server. I like to do it using outer apply:

select c.clientid, c.extFields, x2.x2
from tblSYS_Clients c outer apply
     (select stuff(c.ExtFields, 1, charindex('CX_edu', c.ExtFields), '') as x1
     ) x outer apply
     (select replace(left(x.x1, charindex(' ', x1)), '%20', ' ') as x2
     ) x2
where c.ExtFields like '%CX_Edu%' and
      c.ClientID in ('1633496', '1633692', '1453977', '1657410', '1584563', '1655341', '1632686', '1352611', '1484271', '1361354') ;

Note: I think the above will work. SQL Server does not guarantee that the where is processed before the outer apply subqueries. In practice, I think it does, so the filtering is done guaranteeing that the substring is in ExtFields. If not, this can readily be fixed using case, but that complicates the answer a bit.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • received errors using this code on sql server... Msg 174, Level 15, State 1, Line 3 The stuff function requires 4 argument(s). Msg 102, Level 15, State 1, Line 6 Incorrect syntax near 'x2'. – Lee May 10 '16 at 16:07
0

Use a combination of charindex,reverse and substring. This will work if CX_EduDegree= always appears towards the end of the string.

Demo with sample data

select clientid,
substring(
          extfields
          ,charindex('CX_EduDegree=',extFields)+len('CX_EduDegree=')
          ,charindex(' ',extFields)
         ) as extfield
from tblSYS_Clients
where ExtFields like '%CX_Edu%'
and ClientID in ('1633496','1633692','1453977','1657410','1584563','1655341','1632686','1352611','1484271','1361354') 
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
0

Use this construction:

select clientid, replace(SUBSTRING(extFields,
   PATINDEX('%CX_EduDegree=%',extFields)+13,
      charindex(' ', extFields+' ',
          PATINDEX('%CX_EduDegree=%',extFields)) - 
             PATINDEX('%CX_EduDegree=%',extFields) - 13),'%20',' ') ex 
from  
(values 
(1352611, 'CX_CurrentJobStartDate=01/20/2001  CX_CurrentJobHours=40  CX_SupervisorName=Rhonda%20Kaiser  CX_EduDegree=BS%20in%20Nursing  CX_SupervisorPhone=970-495-8100'), 
(1361354, 'CX_CurrentJobStartDate=06/20/1997  CX_CurrentJobHours=30  CX_SupervisorName=Georgia%20Chapin  CX_SupervisorPhone=702-616-5632  CX_EduDegree=MS/MA%20%20in%20Nursing'), 
(1453977, 'CX_CurrentJobStartDate=08/20/1990  CX_CurrentJobHours=40  CX_SupervisorName=Jan%20Rasco  CX_SupervisorPhone=281-631-8789  CX_EduDegree=Diploma'), 
(1484271, 'CX_CurrentJobStartDate=01/01/2011  CX_CurrentJobHours=40  CX_SupervisorName=Kay%20Hix  CX_SupervisorPhone=317-329-7209  CX_EduDegree=AD%20in%20Nursing'), 
(1584563, 'CX_CurrentJobStartDate=11/26/2006  CX_CurrentJobHours=40  CX_SupervisorName=PHILLIP%20MOISUK  CX_SupervisorPhone=916-453-4545  CX_EduDegree=BS%20in%20Nursing') 
) t(clientid,    extFields)

Results:

clientid    ex
1352611 BS in Nursing
1361354 MS/MA  in Nursing
1453977 Diploma
1484271 AD in Nursing
1584563 BS in Nursing

Some comments inside the code for better understanding. As you see charindex works too. Results are the same.

select clientid, 
replace(
SUBSTRING(extFields,--string to work with
   charindex('CX_EduDegree=',extFields)+13, --start position; 13 is length of CX_EduDegree=
      charindex(' ', extFields+' ', --searching end position +' ' to make sure that space exists
          charindex('CX_EduDegree=',extFields) -- start searching ' ' after this position
          ) --end position found
             - charindex('CX_EduDegree=',extFields) - 13) -- calculate length
             ,'%20',' ') ex --final touch - remove ugly %20 
from  
(values 
(1352611, 'CX_CurrentJobStartDate=01/20/2001  CX_CurrentJobHours=40  CX_SupervisorName=Rhonda%20Kaiser  CX_EduDegree=BS%20in%20Nursing  CX_SupervisorPhone=970-495-8100'), 
(1361354, 'CX_CurrentJobStartDate=06/20/1997  CX_CurrentJobHours=30  CX_SupervisorName=Georgia%20Chapin  CX_SupervisorPhone=702-616-5632  CX_EduDegree=MS/MA%20%20in%20Nursing'), 
(1453977, 'CX_CurrentJobStartDate=08/20/1990  CX_CurrentJobHours=40  CX_SupervisorName=Jan%20Rasco  CX_SupervisorPhone=281-631-8789  CX_EduDegree=Diploma'), 
(1484271, 'CX_CurrentJobStartDate=01/01/2011  CX_CurrentJobHours=40  CX_SupervisorName=Kay%20Hix  CX_SupervisorPhone=317-329-7209  CX_EduDegree=AD%20in%20Nursing'), 
(1584563, 'CX_CurrentJobStartDate=11/26/2006  CX_CurrentJobHours=40  CX_SupervisorName=PHILLIP%20MOISUK  CX_SupervisorPhone=916-453-4545  CX_EduDegree=BS%20in%20Nursing') 
) t(clientid,    extFields)

Do you find this solution acceptable?

Alex Kudryashev
  • 9,120
  • 3
  • 27
  • 36
  • This is really close to what I need. I am still getting some results though that have more after the info I want.... 1352611 BS in Nursing CX_SupervisorPhone=970-495-8100 – Lee May 10 '16 at 16:03
  • The parts of data that occur after and that I want to strip will always begin with CX_[A-Z] – Lee May 10 '16 at 16:13
  • Look at my code. I specially addressed this issue. `PATINDEX('%CX_EduDegree=%',extFields)) - PATINDEX('%CX_EduDegree=%',extFields) - 13)` as `substring length. Results are real from the query. – Alex Kudryashev May 10 '16 at 16:13
  • 1339940 BS in Nursing 1352557 MS/MA in Nursing 1352566 MS/MA in Nursing 1352579 Diploma 1352582 BS in Nursing 1352590 Diploma 1352594 BS in Nursing 1352596 BS in Nursing 1352603 AD in Nursing 1352606 MS/MA in Nursing 1352611 BS in Nursing CX_SupervisorPhone=970-495-8100 1352618 AD in Nursing 1352627 AD in Nursing 1352630 BS/BA Other Field 1352631 BS in Nursing 1352638 BS in Nursing 1352640 AD in Nursing 1352647 Diploma 1352660 BS in Nursing 1352661 BS in Nursing 1352664 AD in Nursing 1352672 MS/MA in Nursing 1352676 MS/MA in Nursing 1352677 MS/MA in Nursing – Lee May 10 '16 at 16:30
  • The results are from the query i just ran. it only leaves trailing info when there is what looks like a space and another entry with CX_Supervisor or CX_VOL, or the next field being stored... – Lee May 10 '16 at 16:32
  • I am thinking i can make another pass at the update for those??? update the ones that work, and fix the rest? – Lee May 10 '16 at 16:32
  • Thank you so much for your help Alex, I am definitely much closer than i was. – Lee May 10 '16 at 16:33
0

One way to do it using XML datatype

SELECT  clientid,
        REPLACE(t.cx.value('(r/@CX_EduDegree)[1]', 'varchar(100)'),'%20',' ') AS degree
FROM    (
    SELECT  clientid, 
            CAST('<r ' + REPLACE(REPLACE(extFields,'=','="') , '  ','"/><r ') + '"></r>') AS XML) cx
    FROM    tblSYS_Clients
    WHERE   clientid in ('1633496','1633692','1453977','1657410','1584563','1655341','1632686','1352611','1484271','1361354') 
) t

xml is a pretty slow though compared to string manipulation queries.

another thing you can do is use a string splitter like one of the ones HERE

here is an example using Jeff Moden's DelimitedSplit8K function.

SELECT  t1.clientid,
        REPLACE(t2.Item,'CX_EduDegree=','')
FROM    tblSYS_Clients t1
        CROSS APPLY (
            SELECT REPLACE(t.Item, '%20', ' ') Item 
            FROM dbo.DelimitedSplit8K(t1.extFields, '  ') t 
            WHERE t.Item LIKE 'CX_EduDegree%' 
        ) t2
WHERE   clientid in ('1633496','1633692','1453977','1657410','1584563','1655341','1632686','1352611','1484271','1361354') 

still not as fast as some of the others here, but might be easier to understand or allow you to get other fields easier.

JamieD77
  • 13,796
  • 1
  • 17
  • 27
  • not able to run the first block of code in MS SQL Server... Msg 195, Level 15, State 10, Line 5 'CONCAT' is not a recognized built-in function name. – Lee May 10 '16 at 16:06
  • @Lee oh yeah.. that's sql 2012 stuff.. ill just replace with `+` – JamieD77 May 10 '16 at 18:56