0

SAS EG

I have a Tab delimited text file that has to be imported every month. So I wrote an import procedure via data step.

data lib.txtimp;
%let _EFIERR_=0;
infile "file/path/tabdlm.txt" lrecl=256
dlm='09'x missover firstobs=2 DSD;
informat <vars>;
format <vars>;
input <vars>;
if _ERROR_ then call symput('_EFIERR_',1);
run;

A recent problem I had is that there are times when some datalines have two tabs by mistake. The text file is huge, in order of 500MB. So I tried to automate the process by writing the above, but it doesn't take the problem in to consideration. It writes a blank in that place. When I use the in-built 'Import Data', it first cleanses the raw file and then runs its intermediate data step to give the desired output. This doesn't give me a blank in that column. It ignores the extra tab.

An example of the problem with my text file.

col1   col2   col3
1   a   b
2   foo   bar
3      wayout   data
4   another      example

sample file

Is there a way that I can automate the cleansing process with my data step import? Or, do you guys know any steps that I should add to get something of that sort?

samkart
  • 6,007
  • 2
  • 14
  • 29
  • What do you mean by 'in-built' Import Data? I doubt anything is 'cleanses' the data first, its likely just using a different option. – Reeza Aug 14 '17 at 19:33
  • No, they have it. File -> Import data. If you closely keep looking at the status/job bar, it clearly states the steps. It cleanses, then runs the intermediate data step. – samkart Aug 14 '17 at 22:15
  • Are you referring to SAS Studio, DI, EG or something else? The post doesn't identify the SAS product you're using. – Reeza Aug 14 '17 at 22:27
  • apologies! I'm using SAS EG. – samkart Aug 14 '17 at 22:45
  • Does removing DSD not work for you? – Reeza Aug 14 '17 at 22:45
  • It doesn't. I tried, and got the same results. – samkart Aug 14 '17 at 22:51
  • You need to attach a sample file that we can play with, in this case a dataset here isn't going to work. AFAIK there's no way to replicate the GUI import via code but I've seen DSD work so there may be another reason it's not. Without sample data though I can't say. – Reeza Aug 14 '17 at 23:42
  • @reeza the sample dataset is the sample of the file. It's a text file. That has headers and rows following. Just the way I have stated. The GUI import used DSD too. It submits a data step and the wizard generates a code that you can refer. The problem is that when using GUI import, there are no blanks. With my code, it generates missing data. But you can create a txt file with my dataset. Tab delimited text file. I'm not sure, what happened in the GUI step that is happening in my data step. – samkart Aug 15 '17 at 13:23
  • @Reeza I have attached a sample file. similar to the dataset. Do post your findings. Thanks! – samkart Aug 15 '17 at 13:29

1 Answers1

1

Remove the DSD option. The DSD option tells SAS to treat two consecutive delimiters as missing, whereas the standard approach will treat two consecutive delimiters as a single delimiter.

http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n1rill4udj0tfun1fvce3j401plo.htm

I cannot replicate your issue. This code works fine for me on SAS 9.4 TS1M3.

data txtimp;
infile "C:\_LOCALdata\temp\sample_tab_dlm_file.txt"  missover lrecl=256
dlm='09'x  firstobs=2 ;
input col1 $ col2 $ col3 $;
run;
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • Found a workaround in my case. limiting the bytes to 256 was an issue. I generalized it to 32767. `lrecl=32767` – samkart Aug 16 '17 at 06:46
  • That was shown in your code already. If the code you posted didn't match what you were running... – Reeza Aug 16 '17 at 11:26
  • I changed `lrecl=256` to `lrecl=32767`. It works now. Previously it was 256 bytes. – samkart Aug 16 '17 at 16:51