Removal of Phrase using wildcards

Question

I'm searching on how to use wildcard characters as part of the removal criteria for a section of a corpus. I was unable to find anything on SO or google related to this issue.

Purpose: Analyzing large dataset of standardized notes where employee input is broken into sections of the text.

Example data:

***Date; Area: asdfwerqw Detail: xxxxx Requested Action: xxxxxx Assigned to: John Doe

Portion to extract for analysis:

Detail:xxxxx Requested Action:xxxxxx

Number of items before Detail may be more. Also, Assigned to: may not appear.

score 0 · Accepted Answer · answered Oct 08 '15 at 19:49

It's hard to tell without more examples and details, but you're probably going to want to use regular expressions with positive lookahead and optional items:

library(stringr)

text <- c("***Date; Area: asdfwerqw Detail: xxxxx Requested Action: xxxxxx Assigned to: John Doe")

str_extract_all(text, c("Detail(.*?)(?=Requested Action:)", "Requested Action:((.*?)(?=Assigned to:))?"))

# [[1]]
# [1] "Detail: xxxxx "
# 
# [[2]]
# [1] "Requested Action: xxxxxx "

Thanks, I think this is exactly what I need. – user2344226 Oct 08 '15 at 20:00 — user2344226, Oct 08 '15 at 20:00

Removal of Phrase using wildcards

1 Answers1