Extract specific emails from different emails in a column- R

Question

I want to extract specific emails (@enron.com) from 'To' column in my dataframe.In some of the rows there are more than one email. For example in one row I have this : mark.guzman@enron.com, creightonca@hotmail.com, brendanf@gfsloans.com, seastape@teleport.com, penn_eric@salkeiz.k12.or.us,joe.stepenovitch@enron.com, jan.king@enron.com. My question is how can I extract just Enron domain (@enron.com) emails from this column and save it in new column?I can extract them but the problem is it puts each email in a row that is not true because for example if a row contains 10 Enron emails out of 20 emails I want to have all that Enron emails in one row not in 10 rows.I run the code from here: How to extract expression matching an email address in a text file using R or Command Line? , emails = regmatches(df, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@enron.com)", df))but I get this error : Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 2, 0, 5.

Can you share a sample of your input data and desired output? — Psidom, Dec 25 '16 at 00:19

score 1 · Accepted Answer · answered Dec 25 '16 at 00:30

1

We can use grep for this

subset(df, grepl("enron.com", To))

If there are multiple emails in a single row, use the str_extract

library(stringr)
data.frame(To =sapply(str_extract_all(df$To, "\\S+@enron.com"), paste, collapse=","))

answered Dec 25 '16 at 00:30

akrun

874,273
37
540
662

Extract specific emails from different emails in a column- R

1 Answers1