I want to extract specific emails (@enron.com) from 'To' column in my dataframe.In some of the rows there are more than one email. For example in one row I have this : mark.guzman@enron.com, creightonca@hotmail.com, brendanf@gfsloans.com, seastape@teleport.com, penn_eric@salkeiz.k12.or.us,joe.stepenovitch@enron.com, jan.king@enron.com
. My question is how can I extract just Enron domain (@enron.com) emails from this column and save it in new column?I can extract them but the problem is it puts each email in a row that is not true because for example if a row contains 10 Enron emails out of 20 emails I want to have all that Enron emails in one row not in 10 rows.I run the code from here: How to extract expression matching an email address in a text file using R or Command Line? , emails = regmatches(df, gregexpr("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@enron.com)", df))
but I get this error : Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 2, 0, 5
.
Asked
Active
Viewed 257 times
1

Community
- 1
- 1

Alex Ramires
- 161
- 3
- 13
-
Can you share a sample of your input data and desired output? – Psidom Dec 25 '16 at 00:19
1 Answers
1
We can use grep
for this
subset(df, grepl("enron.com", To))
If there are multiple emails in a single row, use the str_extract
library(stringr)
data.frame(To =sapply(str_extract_all(df$To, "\\S+@enron.com"), paste, collapse=","))

akrun
- 874,273
- 37
- 540
- 662