I have a tidy dataframe created from a text corpus. I want to create a new binary variable based on the presence of a string from a vector of strings in the tidy corpus texts. My current for loop works, but is much too slow with 600k observations, even though most observations are only 5 or so words.
Tidy df structure: 8 variables, with 8th being the text to be searched, by 600k observations, 9th variable should be 1/0 based on presence of pharma with abuse potential.
abusepharma <- c('xanax', 'diazepam', 'alprazolam', 'adderall', 'oxycodone', 'viagra', 'oxycontin', 'valium', 'fentanyl', 'cialis', 'tramadol', 'amphetamine', 'hydromorphone', 'hydromorphon')
name.clean_tidy$AbusePharma <- NA
for(i in 1:nrow(name.clean_tidy)){
if(grepl(paste(abusepharma,collapse="|"), name.clean_tidy[i,8])){
name.clean_tidy[i,9] <- 1
}else{
name.clean_tidy[i,9] <- 0
}
}