1

I want to extract specific emails (@enron.com) from 'To' column in my dataframe.In some of the rows there are more than one email. For example in one row I have this : mark.guzman@enron.com, creightonca@hotmail.com, brendanf@gfsloans.com, seastape@teleport.com, penn_eric@salkeiz.k12.or.us,joe.stepenovitch@enron.com, jan.king@enron.com. My question is how can I extract just Enron domain (@enron.com) emails from this column and save it in new column?I can extract them but the problem is it puts each email in a row that is not true because for example if a row contains 10 Enron emails out of 20 emails I want to have all that Enron emails in one row not in 10 rows.I run the code from here: How to extract expression matching an email address in a text file using R or Command Line? , emails = regmatches(df, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@enron.com)", df))but I get this error : Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 2, 0, 5.

Community
  • 1
  • 1
Alex Ramires
  • 161
  • 3
  • 13

1 Answers1

1

We can use grep for this

subset(df, grepl("enron.com", To))

If there are multiple emails in a single row, use the str_extract

library(stringr)
data.frame(To =sapply(str_extract_all(df$To, "\\S+@enron.com"), paste, collapse=","))
akrun
  • 874,273
  • 37
  • 540
  • 662