1

I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).

*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.

data clinical;
input
name $ 1-13
visit_date $ 14-23  
group $ 25
;
datalines;
John Turner  03/12/1998 D
Mary Jones   04/15/2008 P
Joe Sims     11/30/2009 J
;
run;
pumphandle
  • 83
  • 5
  • 1
    See this question: https://stackoverflow.com/questions/34605218/sas-import-txt-file-using-infile/34624104#34624104 for explanation of how to use informat with fixed column data. – Tom Oct 13 '21 at 15:24

3 Answers3

1

No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.

data clinical;
    input ID$ visit_date:mmddyy10. group$;
    format visit_date mmddyy10.; * Make the date look human-readable;
    datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;

Output:

ID  visit_date  group
01  03/12/1998  D
02  04/15/2008  P
03  11/30/2009  J
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
  • Actually, my question was oversimplified. In reality my variable values themselves contain spaces, so it is necessary to specify positions. Trying to figure out a way to express that. – pumphandle Oct 13 '21 at 15:59
1

A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.

data clinical; 
input
name $ 1-12
@13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones  04/15/2008 P
Joe Sims    11/30/2009 J
;
run;
pumphandle
  • 83
  • 5
  • Your code is slightly off (24 not 25), but you don't *have* to switch... – Joe Oct 13 '21 at 18:53
  • If you don't want to switch then read them all using FORMATTED mode. `input name $12. visit_date mmddyy10. +1 group $1.;` – Tom Oct 13 '21 at 19:05
  • If I begin in column output, I am forced to abandon it later, yes? That seems like a good argument against using it. – pumphandle Oct 13 '21 at 20:00
0

SAS provides a lot of different ways to input data, just depending on what you want to do.

Column input, which is what you start with, is appropriate when this is true:

To read with column input, data values must have these attributes:

  • appear in the same columns in all the input data records
  • consist of standard numeric form or character form

Your data does not meet this in the visit_date column. So, you need to use something else.

Formatted input is appropriate to use when you want these features:

With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.

Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.

List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:

List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.

(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)

You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • With the posted example you could use `&` modifier to read the first variable in list mode since the longest embedded space is one character and there is a at least two spaces after the name and before the date on every row. – Tom Oct 14 '21 at 15:26
  • Sure, but I'm trying to keep this at a level appropriate for the person reading the question. :) – Joe Oct 14 '21 at 15:44