-1

I want to split a table with many (454) columns in a PROC SQL (perhaps using a macro?) by column names.

For example: a column is starting with "Column21....T", "Column22....T", etc.

  • I want to write all those columns starting with "Column21...T" to a data set called First, and all the columns starting with "Column22....T" to a data set called Second, etc.

  • I want to retain the first and second column of the transposed table because they contain the descriptional rows.

  • I cannot use select column1, column2.... Column454 because of the large number of columns.

How do I do this?

(@Mod: thanks for the clean-up :))

Edit, one picture to say it all:

This is my transposed table with 454 columns

PROC SORT
 DATA=work.stat(KEEP=A_ B_ C_ D_ Vx G Vy)
 OUT=work.stat2;
   BY G;
RUN;

PROC TRANSPOSE DATA=work.stat2
 OUT=WORK.x(LABEL="Transposed")
 PREFIX=Column
 LET
 NAME=Source
 LABEL=Label;

   BY G;
   ID Vx;
   IDLABEL Vy;
   VAR A_ B_ C_ D_;
 RUN;
SuppaDuppa
  • 13
  • 4
  • Why not just generate the tables you want to start with instead of making a giant table and then splitting it? – Tom Feb 25 '18 at 02:07
  • Can you show the `proc transpose` step ? How many rows does the table have ? Regarding `, etc.` how many target data sets are you expecting ? Will `first` contain column21*T, column210*T, ... column219*T ? – Richard Feb 25 '18 at 10:03
  • Thanks guys for your help so far. @Tom: need the giant table because it is concatenated from 2 big tables. If I don't do this, I can't transpose in one step. So I really need 1 big giant table. [at]Richard: please see my attached pic in OP. – SuppaDuppa Feb 25 '18 at 11:40
  • I can understand the need to mask. Can you show code for making an equivalent pre-transpose data table, and the the proc transpose statements used to create the output table with 454 columns. If the transposed table has very many rows you may not want to chop the table into vertical slices, you might get by with writing a macro that generates a `keep=` option clause for each of the pieces that might be further analyzed. – Richard Feb 25 '18 at 16:08
  • If you want an algorithm then you need much clearer description of your input and output data structures. Also it a fuller description of the larger problem might allow for a much simpler solution. – Tom Feb 25 '18 at 17:18
  • You cannot do this easily in SQL, you're better off using a data step and a macro. And since you need a macro you may as well create each table individually rather than doing it all at once. Not seeing why you need to split it anyways, and if you do, why bother combining it in the first place. – Reeza Feb 25 '18 at 17:50
  • @Richard: My result table has 25 rows and 454 columns because I must transpose the original merged table, as said. Please see OP for the transpose SAS code. – SuppaDuppa Feb 25 '18 at 18:17
  • @Tom: basically what I need is an algorithm looping throug all the Columns, saving all the names, sorting these names, and then creating the data sets automatically. atRichard: it doesn;t matter by the way how to call the datasets. It doesn't have to be first, second etc. It can also be x1-xN .. no problem with that. – SuppaDuppa Feb 25 '18 at 18:25
  • @SuppaDuppa: Can you show a proc freq listing of Vx ? Since transposed column names are `column – Richard Feb 25 '18 at 19:08
  • @Richard: thanks for the ideas. the proc freq is like this: `proc freq noprint data=r; tables Vx*G*Datum/list missing; run;` – SuppaDuppa Feb 25 '18 at 19:17
  • @SuppaDuppa: Can you add a code section to the question that lists the Vx values that are in concordance with one or two groups ? – Richard Feb 25 '18 at 22:42

2 Answers2

1

You can use : suffix to make a variable list.

data first ;
  set have ;
  keep id1 id2 column21: ;
run;

data second ;
  set have ;
  keep id1 id2 column22: ;
run;

Updated given more details in question.

So why not just transpose each group separately? Make a macro to transpose one value of Vx.

%macro split(value);
PROC TRANSPOSE DATA=work.stat2
  OUT=WORK.Column&value (LABEL="Transposed Vx=&value")
  PREFIX=Column
  LET
  NAME=Source
  LABEL=Label
;
  BY G;
  WHERE Vx=&value ;
  ID Vx;
  IDLABEL Vy;
  VAR A_ B_ C_ D_;
RUN;
%mend split ;

Then call it once for each value of Vx.

proc sort data=work.stat2(keep=Vx) nodupkey out=Vx_list ;
  by Vx ;
run;
data _null_;
  set Vx_list;
  call execute(cats('%nrstr(%split)(',Vx,')'));
run;
Tom
  • 47,574
  • 2
  • 16
  • 29
  • Your solution actually only works for those situations when you know the groups of columns on beforehand. So only if one know every column group names you want to group by. It will work but it will cost alot time when columns differ too much. Say there are 100 groups of: Column21xxxxT, Column22xxxT, Column23xxxT ... column49xxxT. How would you do it then? Output must be on group by column ... so if one had 100 groups, i will need 100 data sets :( ... ? data first; till data hundred; or so... – SuppaDuppa Feb 25 '18 at 11:44
  • If you want to automate this process, you can write a macro and call it once for each set of columns based on metadata from `sashelp.vcolumn` or the proc sql variant, `dictionary.columns`. – user667489 Feb 25 '18 at 12:05
  • Suppa: Once you get above a personal pain threshold of a N data sets (mine is about 7) wouldn't it be better to work with the data in its original categorical form ? If you go with data having 100 column-sets splitting into 100 data-sets it would make sense to not name each of the pieces differently (first, second, third, ... ninetyninth, hundredth) and simply name the data sets piece1 through piece100. Then you will see the piece basis is simply the columns kept, so maybe you don't need 100 data-sets and instead 100 operations on `have` limited to columns belonging the i-th column-set. – Richard Feb 25 '18 at 16:14
  • Suppa: and repeated operations of the same type, wouldn't be easier if the column names were the same ? so you need more macro to rename to common-basis, if not you get a &-filled solution that might work but be harder to understand. You will probably get more useful answer showing more of the problem situation. – Richard Feb 25 '18 at 16:18
  • @SuppaDuppa If you want an algorithmic solution you need to provide more information. Where are the prefixes `Column21` etc coming from? Do you already have them in a simple dataset? If so use that dataset to generate the data step above. – Tom Feb 25 '18 at 17:20
  • @Tom. the prefix had to be prefixed to the colums after the transpose (see code in OP), as the columns where beginning with a numeric value. Which is not allowed if transposed to variables (columns). So i set the prefix myself to Column it can be anything else. or just _ . So the columns were always starting with 2 digits: 2 an 0, 1, 2, 3, 4, 5, 6 7, 8, 9 after the first 2. After that there is no consequence in the name :-( So now i get: Column21xxx ... Column22xxxxxx etc. – SuppaDuppa Feb 25 '18 at 18:32
1

Macro program:

%macro split(data,outdata);
    %do i=21 %to 121;
        data &outdata.&i;
            set &data;
               keep id1 id2 col&i:;
            run;
     %end;
%mend;
Shenglin Chen
  • 4,504
  • 11
  • 11
  • This is not going to work, as my columns are not consequently. So I mean it is not always Column22001T, Column22002T etc. It can also be: Column22GH2T. Or something not making sense at all. :) – SuppaDuppa Feb 25 '18 at 18:19
  • From your description, you want to keep columns started with the same number to the same data, such as col21ghT, col21asT to the same data. If not, it is help to provide dummy data to explain what you want to do. – Shenglin Chen Feb 25 '18 at 18:36