0

I have comments with multiple ids which I need to pull from comments. Each I’d in separate column is required.

Input data has 2 columns- comment_id & Comment(it has 1 or more IDs)

Desired output should have 2 columns: comment_id & ID

I am using following function.

For Parsing

data work.comments_parsed;
set work.comments;
if _N_ = 1 then do;
    pasre_id=prxparse("/ab[c|d]?e?\d+/");
end;
retain pasre_id;
start = 1;
stops = length(Comment);
run;

For output generation

data work.desired_output;
set work.comments_parsed;
length ID $ 500;
call prxnext(pasre_id, start, stops, Comment, pos, len);
do while (pos >0);
    ID = substr(Comment,pos,len);
    output;
    call prxnext(pasre_id, start, stops, Comment, pos, len);
end;
run;

ERROR: Argument 1 to the function PRXNEXT must be a positive integer returned by PRXPARSE for a valid pattern. ERROR: Internal error detected in function PRXNEXT. The DATA step is terminating during the EXECUTION phase.

I believe error is because of incorrect parsing however when I use prxmatch function by using regular expression directly I am getting proper matching. Can you someone suggest me how I can make this code work.

This code works fine

data pattern_testing;
set work.comments_parsed;
pos = prxmatch("/ab[c|d]?e?\d+?/", Comment);
run;

But this code also gives same error:

data pattern_testing;
set work.comments_parsed;
pos = prxmatch(pasre_id,Comment);
run;
Aman
  • 1
  • 4
  • Hi , welcome to SO!, while asking a questions here do share code and some of the attempts you have done to solve our issue. And also, if you have made any attempts to fix it, please edit them into your question. Cheers! – TMA Jun 16 '20 at 07:20
  • Thanks for useful tips. @aananddham. I have now provided all details in my post. Can you review it now? – Aman Jun 16 '20 at 07:30
  • Can you please provide some sample inputs and expected outputs? –  Jun 16 '20 at 07:40
  • I can provide format of id in the comment column. abc1234567, abd1234, ab12345678, abce123456. Expected output should have 2 columns: Comment_ID and ID (with 1 to many relations). – Aman Jun 16 '20 at 07:42
  • Aman, I don't think the PRXPARSE carries across a step boundary. In this case, the DATA step. Per the SAS documentation: "Successive calls to PRXPARSE do not cause a recompile, but returns the regular-expression-id for the regular expression that was already compiled." Hence, the dataset work.comments_parsed contains a numeric that is the ID of the compiled regex. I don't think your subsequent prxmatch sees that compiled code once past the first data step boundary. Carry the string in a pattern var, not a prxparse id. Happy to be corrected if that is not the case. – AlanC Jun 16 '20 at 08:15
  • @AlanC Can you please suggest where should I make changes? I am little unclear about your suggestions. Since there is a single reg exp to be used, I used retain pasre_id; to retain value of parse ID. I tried storing reg exp in pattern variable and tried passing it in call prxnext however this call function only takes ID returned by prxparse – Aman Jun 16 '20 at 09:02

1 Answers1

0

Code works when I have parsing and prxnext in same data step.

data work.comments_parsed;
set work.comments;

if _N_ = 1 then pasre_id = prxparse("/ab[c|d]?e?\d+/");
retain pasre_id;
length gen_string $ 500;
call prxnext(pasre_id, start, stops, COMMENT, pos, len);
do while (pos >0);
    gen_string = substr(LAST_COMMENT,pos,len);
    output;
    call prxnext(pasre_id, start, stops, LAST_COMMENT, pos, len);
end;
run;
Aman
  • 1
  • 4
  • As it seems like it should. A step is a DATA step or PROC step. Think of them like Lego pieces with inputs and outputs: they are standalone. A function call in a data step should not have an effect on another data step. In your original example, because prxparse is a compiled function, You were trying to carry a future operation inside of a dataset and not the result. You never did anything with prxparse so there was no value carried. In this example, you are using prxparse and using in the same step so it is available to a function in the same step. I hope that sheds some light. – AlanC Jun 16 '20 at 12:55