-2

I want to replace all strings in a column with following:

strings  new_strings
ABC_MNO_S3  S1
ABC_S1      S2
ABC_S2      S3
ABC_PQR_S3  S4
XYZ_MNO_S3  S5
XYZ_S1      S6
XYZ_S2      S7
XYZ_PQR_S3  S8

So whenever any of the above 'string' appear in my column 'states' I want to replace it with 'new_string' dynamically. I tried to put these list in an array and use TRANWRD to search through the column and replace, but not working.

My column states and the desired output looks like following:

states                 states_result
TR_ABC_MNO_S3_ABC_S2    TR_S1_S3
TR_ABC_S1_ABC_S2        TR_S2_S3
Segment                 Segment
ABC_PQR_S3              S4
TR_XYZ_MNO_S3_XYZ_S2    TR_S5_S7
Year                    Year
St_XYZ_S2               St_S7

Could you please help? Thanks!

souravsarkar59
  • 127
  • 1
  • 1
  • 8
  • Your question is answerable given how basic it is, but it would be better if you had included an example of the code you specifically tried here. – Joe Dec 19 '17 at 17:01
  • Post the code you tried. – Reeza Dec 19 '17 at 17:03

1 Answers1

1

Sourav:

Because you mentioned TRANWRD I will presume the strings values can be found embedded within a states value. The key to effective use of TRANWRD is to TRIM the values when variables are used for the target and replacement arguments.

Replacement concerns:

  • A single use of TRANWRD will work if there is only one target value embedded.
  • A loop over all targets would be needed if there can be more than one target value embedded.

There is a possibility that an earlier replacement could make rise a valid replacement that was not previously obvious. Consider the following state value:

ABC_ABC_MNO_S3

A first loop over all targets would replace ABC_MNO_S3 with S1 and yield

ABC_S1

A second loop over all targets would replace ABC_S1 with S2 and yield

S2

Tested sample:

data have;
infile cards dlm="," dsd;
length states segment year $100;
input states segment year;
datalines;
"TR_ABC_MNO_S3_ABC_S2 TR_ABC_S1_ABC_S2", "ABC_PQR_S3 TR_XYZ_MNO_S3_XYZ_S2", "St_XYZ_S2"
run;

data mappings;
length string $30 new_string $2;
input string new_string;
datalines;
ABC_MNO_S3  S1
ABC_S1      S2
ABC_S2      S3
ABC_PQR_S3  S4
XYZ_MNO_S3  S5
XYZ_S1      S6
XYZ_S2      S7
XYZ_PQR_S3  S8
run;

data want;
  array maps(100,2) $50 _temporary_; * first dimension must be larger than number of mappings;
  do _i_ = 1 by 1 until (lastmap);
    set mappings(rename=(string=_map_from new_string=_map_to)) end=lastmap;
    maps(_i_,1) = _map_from;
    maps(_i_,2) = _map_to;
  end;

  length status $12 _result $200;

  do until (lastdata);

    set have end=lastdata;
    array targets states segment year;

status = 'ORIGINAL'; 
output;

    do _i_ = 1  to dim(targets);

      _result = targets[_i_];
      _guard = 1;
      do until (_noreplacement or _guard >= 10);
        _noreplacement = 1;
        do _j_ = 1 to dim(maps,1) while(maps(_j_,1) ne '');
          if index(_result,trim(maps(_j_,1))) then do;

* put _result ': ' maps(_j_,1) '-> ' maps(_j_,2);

            _result = tranwrd(_result, trim(maps(_j_,1)), trim(maps(_j_,2)));
            _noreplacement = 0;
          end;
        end;
      end;

      if (_guard > 10) then do;
        put 'WARNING: Guard limit 10 reached, mappings may be cycling.' _result;
      end;

      targets[_i_] = _result;
    end;

    status = 'MAPS APPLIED';
    output;
  end;

  stop;
  drop _:;
run;
Richard
  • 25,390
  • 3
  • 25
  • 38
  • your code is giving me blank results. My data is following: `states` `TR_ABC_MNO_S3_ABC_S2` `TR_ABC_S1_ABC_S2` `Segment` `ABC_PQR_S3` `TR_XYZ_MNO_S3_XYZ_S2` `Year` `St_XYZ_S2` – souravsarkar59 Dec 20 '17 at 15:49
  • Can you edit the original question and add the data information in a code block ? The comment makes it look like you have three columns. – Richard Dec 20 '17 at 16:21
  • there's just one column. Edited the question to add the sample column and the desired output – souravsarkar59 Dec 20 '17 at 18:41
  • The answer has been updated with tested code to handle an array of columns, so just change the variables listed for targets to just `state`. The above code has two outputs for each input row, one for original and one for transformed. You should be able to go forward on your own making a new column from _result instead of a new row. – Richard Dec 20 '17 at 18:52