My goal is to pull out a specific section in a set of word documents according to key words. I'm having trouble parsing out specific sections of text from a larger data set of text files. The data set originally looked like this, with "title 1" and "title 2" to indicate the start and end of the text I am interested in and unimportant words to indicate the part of the text file I am not interested in:
**Text** **Text File**
title one Text file 1
sentence one Text file 1
sentence two Text file 1
title two Text file 1
unimportant words Text file 1
title one Text file 2
sentence one Text file 2
Then I used as.character to turn the data into characters and used unnest_tokens to tidy up the data
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
tidy_df <- df %>% unnest_tokens(word, Text, token = "words")
I would now like to only look at the sentences in my dataset and exclude the unimportant words. Title one and title two are the same in every text file, but the sentences between them are different. I've tried this code below, but it does not seem to work.
filtered_resume <- lapply(tidy_resume, (tidy_resume %>% select(Name) %>% filter(title:two)))