-1

In SAS, I've a variable V containing the following value

V=1996199619961996200120012001

I'ld like to create these 2 variables

V1=19962001 (= different modalities)

V2=42 (= the first modality appears 4 times and the second one appears 2 times)

Any idea ?

Thanks for your help.

Luc

Reeza
  • 20,510
  • 4
  • 21
  • 38
user2129506
  • 131
  • 8

2 Answers2

1

For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:

a = substr(variable, 1,4)

b = substrn(variable,max(1,length(variable)-3),4);

You could then concatenate the two.

c = cats(a,b)

For the second, the COUNT function can be used to count occurrences of a string within a string:

http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm

Hope this helps :)

ckruse
  • 11
  • 1
1

Make it a bit more general;

%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;

Where does a certain occurrence start?;

%macro occurStart(occurNo);
    &modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;

Read the input;

data simplified ;
    infile datalines truncover;
    input v $%eval(&modeLength.*&maxOccur.).;

Declare output and work variables;

    format what $&modeLength.. 
           v1   $%eval(&modeLength.*&maxModes.). 
           v2   $&maxModes..;

    array w {&maxModes.}; ** what **;
    array c {&maxModes.}; ** count **;

Discover unique modes and count them;

    countW = 0;
    do vNo = 1 to length(v)/&modeLength.;
        what = substr(v, %occurStart(vNo), &modeLength.);
        do wNo = 1 to countW;
            if what eq w(wNo) then do;
                c(wNo) = c(wNo) + 1;
                goto foundIt;
            end;
        end;
        countW = countW + 1;
        w(countW) = what;
        c(countW) = 1;

        foundIt:
    end;

Report results in v1 and v2;

    do wNo = 1 to countW;
        substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
        substr(v2, wNo, 1) = put(c(wNo),1.);
        put _N_= v1= v2=;
    end;
    keep v1 v2;

The data I testes with;

    datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;
Dirk Horsten
  • 3,753
  • 4
  • 20
  • 37