0

I have a dataset which I manipulate in proc-iml and then create a new dataset reading some of the manipulated values in. When I read character values in, their length is changed from 7 to 9.

This doesn't really create a problem, except for the minor annoyance that when I later merge this new dataset, I receive the warning that the variables' length is different in two datasets.

Is there a way to keep the length of the original variable?

Sample code

data data1;
infile datalines delimiter=',';

input classif :$9. time :$7.;
datalines;
05, 2021_11
051, 2021_11
;
run;

proc iml;
    use work.data1;
    read all var {classif time } into _temp_1;
    classif = _temp_1[,1];
    time   = _temp_1[,2];
close;
create work.data2 var{classif time};
append; 
quit;

Observe how the length of time is 7 in data1, but 9 in data2.

Leksa99
  • 43
  • 4

3 Answers3

3

As @Richard explained, this happens when you read two character variables that have different lengths into columns of a common matrix. I can think of at least three workarounds. Depending on your application, one of these methods might be more convenient than others.

proc iml;
/* Option 1: Read variables into vectors, not a matrix */
use work.data1;
read all var {classif time };
close;
print (nleng(time))[L="nleng(time)"];

/* Option 2: Allocate time to have LENGTH=7 and copy the data in */
use work.data1;
read all var {classif time } into _temp_1;
close;
time = j(nrow(_temp_1), 1, BlankStr(7));  /* allocate char vector */
time[,]   = _temp_1[,2];                  /* copy the data */
print (nleng(time))[L="nleng(time)"];

/* Option 3: Read into a table instead of a matrix. */
tbl = TableCreateFromDataset("work", "data1") ;
classif = TableGetVarData(tbl, {"Classif"});
time = TableGetVarData(tbl, {"time"});
print (nleng(time))[L="nleng(time)"];
Rick
  • 1,210
  • 6
  • 11
2

If you want the variables from DATA1 to be defined the same in DATA2 you could just add a data step after your PROC IML code.

data data2;
  set data1(obs=0) data2;
run;

It works because SAS defines the variables the first time they are seen. In this case the variables are defined by how the are defined in DATA1 even though the OBS=0 dataset option will prevent any observations actually being read from DATA1.

Tom
  • 47,574
  • 2
  • 16
  • 29
1

From Understanding the SAS/IML Language

Defining a Matrix

A matrix is the fundamental structure in the SAS/IML language. A matrix is a two-dimensional array of numeric or character values. Matrices are useful for working with data and have the following properties:

  • Matrices can be either numeric or character. Elements of a numeric matrix are double-precision values. Elements of a character matrix are character strings of equal length.

The INTO places the character values into a matrix _temp_1 that must hold all the original values, so the elements width are the attribute length of the widest data set variable.

The attributes of the _temp_1 matrix elements are propagated through the assignment statements.

Richard
  • 25,390
  • 3
  • 25
  • 38
  • In other words, to solve my problem, the 'classif' variable would have to be of length 7, right? @Richard – Leksa99 Feb 07 '22 at 11:07
  • 1
    Yes. The constraint of the approach is that the widths of the variables in the output data set will all have the same length, and the width be the widest of the IML structures that are composited as a matrix for output. – Richard Feb 07 '22 at 11:24