I have a dataframe that contains some cells with error messages as string. The strings come in the following forms:
ERROR-100_Data not found for ID "xxx"
ERROR-100_Data not found for id "xxx"
ERROR-101_Data not found for SUBID "yyy"
Data not found for ID "xxx"
Data not found for id "xxx"
I need to extract the number of the error (if it has one) and the GENERAL description, avoiding the specificity of the ID or SUBID. I have a function where I use the following regex expression:
sub(".*?ERROR-(.*?)for ID.*","\\1",df[,col1],sep="-")
This works only for the first case. Is there a way to obtain the following results using only one expression?
100_Data not found
100_Data not found
101_Data not found
Data not found
Data not found