2

I have the following dataset

DATA EXAMPLE1;
  INPUT Names $char30.;
DATALINES;
AARON RAY, MD INC
AARON,RAY MD (1371927)
RAY,AARON,MD
;
run;

I want to delete all characters after 'MD'. Expecting below output

Names                      Want_names
AARON RAY, MD INC          AARON RAY, MD
AARON,RAY MD (1371927)     AARON,RAY MD
RAY,AARON,MD               RAY,AARON,MD

Any suggestions?

Thanks

PriyamK
  • 141
  • 10

2 Answers2

1

Probably overkill, but you certainly could use a regular expression for this.

data want;
  set example1;
  rx_name = prxparse('s~(.*)([ ,]+MD)(.*)~$1$2~ios');
  new_name = prxchange(rx_name,1,names);
run;

I grab three groups: the part before MD, then at least one of space or comma and MD, then all the rest of the characters; then replace with just the first two groups.

Joe
  • 62,789
  • 6
  • 49
  • 67
0

In this case, just keep the characters your want instead of deleting what you don't want.

You could use the find function (with the c) to determine where your target string (MD) starts.

spot = findc(names,'MD');

Now you have a variable that determines where your target string begins. Taking advantage of the substr function you can now control the exact spot that you want the string to end (essentially cutting out the MD characters), and leaving you with what's left after your remove your target (from the point of spot and on).

Given that you want the entire string prior to your target character MD, you'll control your substring argument by setting your end position from your spot, minus 1 position:

want = substr(names,1,spot-1);

To verify your position of spot, control your data step to keep your pointer variable spot for reference.

want will re-create your string with your target MD removed.

In an effort to clean the string (if necessary), you could use some other functions to remove the now trailing special characters (like the , at the end of your string for records 1 and 3).

data search;
set example1;
spot = findc(names,'MD');
want = substr(names,1,spot-1);
/*drop spot;*/
run;
SMW
  • 470
  • 1
  • 4
  • 19
  • 2
    It might be better to use `findw(name, 'MD', ', ')` instead. That way if a person's name contains `MD` the rest of the text will not be cut off. – Stu Sztukowski Feb 27 '21 at 15:38
  • 2
    The FINDC() is not right for this problem. That will return the first location that is either an M or a D. – Tom Feb 27 '21 at 18:01
  • @SMW So I want to remove characters only after 'MD' or keep characters until 'MD' including it. I do not want to remove 'MD'. Is there a way to do it? – PriyamK Feb 28 '21 at 08:23
  • If you want to now keep the 'MD' you could adjust the pointer. But given that both Stu and Tom bring up valid points, you may want to seek alternatives to my original answer. I think the `INDEX` function will locate the actual string 'MD' instead of just identifying a single character as Tom is pointing out. You can always fall back to the documentation to enhance the INDEX functionality to serve your need. – SMW Feb 28 '21 at 19:48