-1

I am trying to create a SAS Data Set from a text file. The text file shows data in a format exactly like this:

-HEADER HEADER HEADER

-HEADER HEADER HEADER

April SpringRace Male

$$$$$$$$$$$$$$$$$$$$

Name Age State /these are titles in the text file/

$$$$$$$$$$$$$$$$$$$$

John Smith 30 CA

Mark Doe 49 TX

May SpringRace2 Female

$$$$$$$$$$$$$$

Name Age State

$$$$$$$$$$$$$$

Betty White 50 ME

Jane Smith 37 NY


The issue I am having going through the data step is: by-passing varying header rows and then collecting the "event" data before the ****** titles ******* as variables then skipping over the titles and assigning variables for the actual people. It is a similar format throughout the huge text file. Please can anyone point me in the right direction?

I have been experimenting: Data work.test; infile c:\tester dlm=' , $' missover; input / / / Month $15. EventName $15. Gender $6. (This is where I get stuck as I do not know how to skip the "Name Age State" in the text file and just assign variables to "John Smith 30 CA" etc.) run;

I also think there must be a better way to get passed the headers as there is no certainty that they will always only be 2 rows long.

Thanks

Jax
  • 27
  • 6
  • Try reformatting your example so that it appears exactly how it is in the file, and then showing what you would like the output to look like. Ideally you'd also show code that you've tried. Here's the formatting guide - http://stackoverflow.com/help/formatting – Robert Penridge Feb 18 '15 at 23:58
  • I can't make sense of this. Sometimes the easiest thing to do is to read everything in, then get rid of extraneous stuff. – DWal Feb 19 '15 at 00:24
  • Sorry, the actual text file is a lot longer and a little more complicated. I was just trying to get some ideas on how to get passed the first hurdle. The first hurdle being how to read the data into sas while skipping the event labels(i.e Name Age State) in the text file and how to retain the event variables(i.e April SpringRace Male as Month EventName Gender) with each event having varying numbers of participants(i.e John Smith 30 CA as FName LName Age State). – Jax Feb 19 '15 at 02:32

2 Answers2

0

I think that using @'my_char_string' column pointer in an INPUT statement would help you, if the titles that separate data values always repeat and you know what they are. For example:

INFILE mydatafile FLOWOVER FIRSTOBS=2;

INPUT month $ race $ sex $ @'State' first_name $ last_name $ address $;

The FIRSTOBS=2 option in INFILE statement skips the HEADER HEADER... row, and the FLOWOVER option tells SAS to keep looking for data on next line, in particular for @'State'. You may need to specify additional options and formatting, depending on your input file format, delimiters etc.

Per your edits, you could use the month value to determine that you are reading the start of an event, and then, using trailing @, retain and some conditional logic, read in your participants on separate lines and retain the event info across the participants, like this (just add all the remaining month names in the first if clause):

data test1;
    length test $20 month $20 event $20 gender $20 firstname $20 lastname $20 state $2;
    infile "test1.txt" DLM=' $' FIRSTOBS=5; 

    retain month event gender;   * Keep these values from last readin;

    input test $ @;    /* Read in the first word in the data line being 
                         read into test var, and stay on this line for 
                         now (with @)*/

    if strip(test) in('April', 'May') then do; /* If test var contains month,
                                                 then read in all of the variables, 
                                                 and skip the name/age/state titles row*/
        input @1 month $ event $ gender $ @'State' firstname $ lastname $ age state $ ;
    end;
    else do;    /* Otherwise, the data line being read in should contain 
                  only names, age and state, so read in those values only.
                  The month, event and gender values will be kept the same 
                  by the retain statement above.*/ 
        input @1 firstname $ lastname $ age state $ ;
    end;
    drop test; /* Comment out this drop statement to see whats in test var*/
run;

This code will work with varying numbers of participants per event. But the month cannot be missing in order for this code to work.

Helpful tip: To see what is in the current data line being read in by SAS, try adding

 put _INFILE_;

after the INFILE statement. It will print the data lines to your log the way SAS sees them.

3eyes
  • 71
  • 1
  • 3
  • Thank you for your help. I will try to work with this tomorrow. Could you explain the logic behind the 'test' temporary variable please? The actual text file is a lot longer than this and different events have varying amounts of participants. – Jax Feb 19 '15 at 02:24
  • Sure. I edited/added SAS comments in the code above to explain the logic. Hope it helps. For more details, this page is very helpful: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146292.htm . The example 5 at the bottom of that page also uses conditional input logic, and it is much simpler there. – 3eyes Feb 19 '15 at 03:47
  • Aaand I just realized that April can be a name. So you'll need to make the if condition a bit stricter. One way to do it is to test the first word and the third 'word', similar to above. If the first word is a month and the third word is gender (male/female/other..), then read in as the if condition in code above. – 3eyes Feb 19 '15 at 16:22
0

Hopefully you solved your problem a long time ago, but here is another suggestion. Using the trailing @ on the input statement lets you apply a second input statement and would be the preferred solution. This solution does not really use the trailing @ but I left it in for you to consider in the future.

DATA test;
INFILE 'stacktest.txt' lrecl=200 missover;
length n1 n2 n3 n4 $20. ;
input @1 c1 $1. @1 c2 $2. @1 c5 $5. @1 lne & $75. @ ;
keep month event gender fname lname age state;
if c1 = ' ' then return;
if c1 = '-' then return;
if c1 = '$' then return;
if c5 = 'Name' then return;

n1 = scan(lne, 1);
n2 = scan(lne, 2);
n3 = scan(lne, 3);
n4 = scan(lne, -1);

if ( n3 eq 'Male' or n3 eq 'Female')  then do;
   month = n1 ;
   event = n2;
   gender = n3  ;
     return;
     end;
 else do ;
*  input  fname $ lname $ age  state $ ;
fname = n1 ;
lname = n2 ;
age   = n3 ;
state = n4 ;
     output;
      end;
 retain month event gender;
run;
dain
  • 63
  • 5