-1

I have a dataframe consisting of multiple unclean email addresses, for example

1) abc@gmailcom 2) def@yahoo.commm 3) fgh@yahoo.coin 4) xyz@gmail

I want to use gsub to clean these emails with the use of another dataframe which will contains patterns and replacements like,

  • if pattern found is 'comm' them replace with 'com'
  • if pattern found is '.coin' them replace with '.co.in'
  • if pattern found is 'gmail' then replace with 'gmail.com' (like in case 4 above,but here I don't want to replace emailids of type abc@gmail.com)

Can someone please suggest a gsub regex.

  • `gsub("comm", "com", gsub(".coin", ".co.in", gsub("gmail(?!\\.com)", "gmail.com", theString, perl=TRUE), fixed=TRUE), fixed=TRUE)`. – Wiktor Stribiżew Oct 18 '16 at 06:54
  • Thanks, this should work for one domains. What if I have multiple domains, like gmail,yahoo and then want to replace @yahoo with yahoo.com? Will I have to do multiple nesting? Or is there any way to have a reference data frame with patterns and replacements and use that in gsub? – Saurabh Wardhane Oct 18 '16 at 06:59
  • 1
    `emails <- c('abc@gmailcom', 'def@yahoo.commm', 'fgh@yahoo.coin', 'xyz@gmail') ; stringr::str_replace_all(emails, c('com+' = 'com', '.coin' = '.co.in', 'gmail\\.?(com)?' = 'gmail.com'))` – alistaire Oct 18 '16 at 07:00
  • Thanks alistaire. That should work. Just one more thing, in str_replace_all, can I pass a dataframe as second argument for taking care of multiple cases? – Saurabh Wardhane Oct 18 '16 at 07:03
  • See `?stringr::str_replace_all`, but the short answer is that it has to be a named list where the pattern is the name and the replacement is the element. You can easily construct a suitable list from a data.frame with `setNames`, though. – alistaire Oct 18 '16 at 07:09

1 Answers1

1

Create a list of patterns and the replacement and use gsubfn

library(gsubfn)
lst <- list(gmailcom = "@gmail.com", yahoo.commm = "@yahoo.com", 
              yahoo.coin = "@yahoo.co.in", gmail = "@gmail.com")
gsubfn("@(.*)", lst , str1)
#[1] "abc@gmail.com"   "def@yahoo.com"   "fgh@yahoo.co.in" "xyz@gmail.com"  

data

str1 <- c("abc@gmailcom", "def@yahoo.commm", "fgh@yahoo.coin", "xyz@gmail")
akrun
  • 874,273
  • 37
  • 540
  • 662