3

I have a sample data set like below.

data d01;
   infile datalines dlm='#';
   input Name & $15. IdNumber & $4. Salary & $5. Site & $3.;
   datalines;
アイ# 2355# 21163# BR1
アイウエオ# 5889# 20976# BR1
カキクケ# 3878# 19571# BR2
;

data _null_ ;
 set d01 ;
 file "/folders/myfolders/test.csv" lrecl=1000 ;
 length filler $3;
 filler = '   ';
 w_out = ksubstr(Name, 1, 5) || IdNumber || Salary || Site || filler;
 put w_out;
run ;

I want to export this data set to csv (fixed-width format) and every line will has the length of 20 byte (20 1-byte-character).

enter image description here

But SAS auto remove my trailing spaces. So the result would be 17 byte for each line. (the filler is truncated)

I know I can insert the filler like this.

put w_out filler $3.;

But this won't work in case the `site' column is empty, SAS will truncate its column and the result also not be 20 byte for each line.

EagerToLearn
  • 675
  • 7
  • 24
  • That output format is NOT a CSV file. It does not have comma separated values. It is not even any type of delimited file, it is a fixed format file. You probably do NOT want to use multi-byte character set with a fixed format file for exactly the reason you are having trouble. – Tom Aug 15 '19 at 12:57
  • @Tom Thanks for the answer. In fact the file extension isn't really related to the file format. But the requirement is to export a fixed-width format file to a csv-extension-file. I know the multi-byte character set is causing problem, but I have no other choice, using SAS 9.4 University Edition, I can't change the session encoding but to stick with the default UTF-8 encoding. – EagerToLearn Aug 19 '19 at 01:57
  • I was just warning you that whatever programs that are going to try to read this file are going to have the exact same problems reading the file as your SAS program is having with writing the file. Also that you are using an extension for your filename that will cause confusion since it means a specific type of file and the file you are generating is NOT in that format. – Tom Aug 19 '19 at 13:33
  • Hi Tom, I understand what you are trying to warn. First the file name extension won't make any confusion since it's a requirement on our running system (> 15 years). Secondly, we have no trouble writing in this format with SAS 9.3 (commercial version which allows us to set the default Encoding to what we like, but now we can't use that version anymore, hence the 9.4 University Edition) or C# version app. – EagerToLearn Aug 20 '19 at 00:38

3 Answers3

2

I didn't quite understand what you are trying to do with ksubstr, but if you want to add padding to get the total length to 20 characters, you may have to write some extra logic:

data _null_ ;
 set d01 ;
 file "/folders/myfolders/test.csv" lrecl=1000 ;
 length filler $20;
 w_out = ksubstr(Name,1,5) || IdNumber || Salary || Site;

 len = 20 - klength(w_out) - 1;
 put w_out @;
 if len > 0 then do;
   filler = repeat(" ", len);
   put filler $varying20. len;
 end;
 else put;
run ;
SAS2Python
  • 1,277
  • 5
  • 16
  • Worked perfectly. Thanks alot. The reason for the ksubstr is because japanese character is 3 byte in SAS, so I can not use the standard substr function, all to ensure that the output string's length is 17 character (byte) – EagerToLearn Aug 15 '19 at 07:31
  • The reason to use the K... series of functions is because they count by the number of characters INSTEAD of the normal functions which just count number of bytes. – Tom Aug 15 '19 at 12:19
  • I get what the k functions do, but not sure why you are using subst here. – SAS2Python Aug 15 '19 at 13:16
  • @PythonRSAS I want to ensure that the output value of Name column will be 5 characters (including spaces if any). – EagerToLearn Aug 17 '19 at 02:02
  • `アイウエオ`'s length is actually 5 bytes, but inside SAS, it is 15-byte-string. – EagerToLearn Aug 17 '19 at 02:04
  • In case the name is `アイ` (6 bytes), if I don't use `ksubstr`, the output value will be `アイ(9 spaces)` (15 bytes = Name columns's length) – EagerToLearn Aug 17 '19 at 02:05
  • Got it, so you are essentially converting the ascii spaces to '3 byte' spaces. Thanks for clarifying – SAS2Python Aug 17 '19 at 02:32
2

You probably do not want to write a fixed column file using a multi-byte character set. Instead look into seeing if your can adjust your process to use a delimited file instead. Like you did in your example input data.

If you want the PUT function to write a specific number of bytes just use formatted PUT statement. To have the number of bytes written vary based on the strings value you can use the $VARYING format. The syntax when using $VARYING is slightly different than when using normal formats. You add a second variable reference after the format specification that contains the actual number of bytes to write.

You can use the LENGTH() function to calculate how many bytes your name values take. Since it normally ignores the trailing space just add another character to the end and subtract one from the overall length.

To pad the end with three blanks you could just add three to the width used in the format for the last variable.

data d01;
  infile datalines dlm='#';
  length Name $15 IdNumber $4 Salary $5 Site $3 ;
  input Name -- Site;
datalines;
アイ# 2355# 21163# BR1
アイウエオ# 5889# 20976# BR1
カキクケ# 3878# 19571# BR2
Sam#1#2#3
;

filename out temp;
data _null_;
  set d01;
  file out;
  nbytes=length(ksubstr(name,1,5)||'#')-1;
  put name $varying15. nbytes IdNumber $4. Salary $5. Site $6. ;
run;

Results:

67    data _null_ ;
68      infile out;
69      input ;
70      list;
71    run;

NOTE: The infile OUT is:
      Filename=...\#LN00059,
      RECFM=V,LRECL=32767,File Size (bytes)=110,
      Last Modified=15Aug2019:09:01:44,
      Create Time=15Aug2019:09:01:44

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         アイ   235521163BR1    24
2         アイウエオ588920976BR1    30
3         カキクケ 387819571BR2    28
4         Sam  1   2    3      20
NOTE: 4 records were read from the infile OUT.
      The minimum record length was 20.
      The maximum record length was 30.
Tom
  • 47,574
  • 2
  • 16
  • 29
1

By default SAS sets an option of NOPAD on a FILE statement, it also sets each line to 'variable format', which means lengths of lines can vary according to the data written. To explicitly ask SAS to pad your records out with spaces, don't use a filler variable, just:

  • Set the LRECL to the width of file you need (20)
  • Set the PAD option, or set RECFM=F

Sample code:

data _null_ ;
 set d01 ;
 file "/folders/myfolders/test.csv" lrecl=20 PAD;
 w_out = Name || IdNumber || Salary || Site;
 put w_out;
run ;

More info here: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000171874.htm#a000220987

mjsqu
  • 5,151
  • 1
  • 17
  • 21
  • Thanks for the answer. But I can not specify a width beforehand. As SAS processes Japanese character as 3-byte-per-character inside its session. For example "アイウエオ" would be 15 bytes inside the session ... – EagerToLearn Aug 15 '19 at 01:35
  • Sorry I edited my question. That's why I have to use `ksubstr(Name, 1, 5) ` when exporting the data – EagerToLearn Aug 15 '19 at 01:42