2

I am attempting to extract parts of a string with carriage returns based on a keyword in SAS eg.

Str="started manually 23:56:22
Skipped: obtain name
Completed: verify dob
Skipped: ask again
Skipped: verify address 
Completed: verify country
Stopped manually 23:58:55"

I am trying to pull out parts of this based on Skipped or Completed. So if i ask for Skipped i would want end result to be, including carriage return:

Obtain name
Ask again
Verify address

I am thinking I would need to loop to find each instance of skipped. I have tried different variations of SCAN,FIND,INDEX,SUBSTR.

I have extracted substring before but only when the string was consistent in character/word position. This string will very greatly. Any number of skipped or completed could be present with different start and end wording. I am working in SASeg.

Joe
  • 62,789
  • 6
  • 49
  • 67
hadoo
  • 213
  • 1
  • 12
  • Do you actually have text with LINE BREAKS in a variable in a dataset? – Nazar Merza Jul 14 '16 at 14:49
  • yes, i would be importing an excel spreadsheet that has hard returns. i was breaking them out in excel but it takes way to long. i thought the question would be easier this way. – hadoo Jul 14 '16 at 14:56
  • Have you actually imported it into SAS, and checked that the variable does have line breaks in the text? Because it seems that you have not done so yet. – Nazar Merza Jul 14 '16 at 15:14
  • there are definitely line breaks – hadoo Jul 14 '16 at 15:21
  • You are avoiding to directly answer the question. There might be line breaks in the Excel sheet. But, 1) Have you imported the file into SAS? 2) Have you checked that once imported into SAS, line breaks still exist in the text? If you want to solve your question, you need to provide accurate information. – Nazar Merza Jul 14 '16 at 15:37

4 Answers4

1

The basic idea here would be to use CALL SCAN to iterate over the string's words, then do whatever you want with its results.

I use '|' here as word delimiter because it's hard to get carriage return into the editor, but if it's really CRs you use '0A'x as the word delimiter instead.

data have;
  Str="started manually 23:56:22|Skipped: obtain name|Completed: verify dob|Skipped: ask again|Skipped: verify address|Completed: verify country|Stopped manually 23:58:55";
  length str_els1-str_els10  $40;
  array str_els[10] $;

  _pos=0;
  _count=1;
  _length=0;

  do _count = 1 by 1 until (_pos le 0);
    call scan(str,_count,_pos,_length,'|');
    if _pos>0 then str_els[_count] = substr(str,_pos,_length);
  end;

run;

You presumably want to do more than just put this in an array, but I think from here you can figure out the rest. Another SCAN with : is probably the starting point.

Joe
  • 62,789
  • 6
  • 49
  • 67
0

I do not know how to do this with SAS, but one suggestion might be:

  • Replace line-brakes with "/><Data type="
  • Replace :_ (the _ is a blank!) with " value="

This would create something like

started manually 23" value="56" value="22"/>
<Data type="Skipped" value="obtain name"/>
<Data type="Completed" value="verify dob"/>
<Data type="Skipped" value="ask again"/>
<Data type="Skipped" value="verify address "/>
<Data type="Completed" value="verify country"/>
<Data type="Stopped manually 23" value="58" value="55

Replace the first line with <root> and the last line with </root> and you get

<root>
  <Data type="Skipped" value="obtain name"/>
  <Data type="Completed" value="verify dob"/>
  <Data type="Skipped" value="ask again"/>
  <Data type="Skipped" value="verify address "/>
  <Data type="Completed" value="verify country"/>
</root>

Now it should be easy to retrieve your data with XPath like

/root/Data[@type="Skipped"]

EDIT

As I have no idea how to read XML with XPath in SAS I was searching and found this: POSITION()-function in PATH-element of XML-map works in XML Mapper, but not in SAS-code

Hope this helps...

Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • You certainly could do what you list here, though I think CALL SCAN is easier. SAS has a native XML reader using XML Maps (which could use XPath) as you link to; it works fairly well and is pretty easy to use. Just overkill for this I suspect. – Joe Jul 14 '16 at 19:19
0

Well, not sure what language you are in, but in C# I would either use LINQ or break split the string into an array like

var myarray=str.Split(' ');

Then you could go through the array and find the various words individually and return the next 2 indexes after finding the word. The code below outputs:

obtain name
ask again
verify address


public class Program
{
    public static void Main()
    {
        //Console.WriteLine("Hello World");
        Program.GetReturn("started manually 23:56:22 Skipped: obtain name Completed: verify dob Skipped: ask again Skipped: verify address  Completed: verify country Stopped manually 23:58:55","Skipped:");
    }

    public static string GetReturn(string str,string word)
    {
        string mystr="";
        var myarray=str.Split(' ');

        for(int i = 0; i< myarray.Length-1; i++)
        {

            if(myarray[i]==word)
            {
                mystr=myarray[i+1] + " " +myarray[i+2];
                Console.WriteLine(String.Format("{0} {1}", myarray[i+1],myarray[i+2]));
            }
        }       

        return mystr;
    }

I am sure there is a Linq answer that will be much more elegant tho...

In SQL, not sure...

MattE
  • 1,044
  • 1
  • 14
  • 34
0

you did not mention whether or not you want those extracts as records so here is my simple approach:

You can use byte(10) to simulate your line returns.

data a(keep=record str);
    length str $200.;
    str="started manually 23:56:22" || byte(10) ||
    "Skipped: obtain name"  || byte(10) ||
    "Completed: verify dob"  || byte(10) ||
    "Skipped: ask again"    || byte(10) ||
    "Skipped: verify address"   || byte(10) ||
    "Completed: verify country"  || byte(10) ||
    "Stopped manually 23:58:55";        
    do i=1 by 1 while(scan(str,i,byte(10)) ^=' ');
    new=scan(str,i,byte(10));
    if index(new,"Skipped")>=1 or index(new,"Completed")>=1 then
        record = scan(new,-1,":");
        else
        record = '';
    if length(record)>1 then output;
    end;
run;
Altons
  • 1,422
  • 3
  • 12
  • 23