-1

I want to transform a vector of adresses that look like this Firstname.Lastname@bla.com into a dataframe that contains 3 variables firstname lastname and adress.

this is the vector I want to transform emails <- c("fname1.lname1@bla.com", "fname2.lname2@bla.com", "fname3.lname3@bla.com")

What function/functions should I use?

EDIT : I'm supposed to create a function that takes an email as input and return a dataframe.

  • 1
    Are we assuming the format of email address is always the same? name.surname@bla.com ? What happens with emails like: "myNameNosurname1@bla.at" ? – zx8754 Feb 11 '21 at 16:48
  • We are assuming the format to be the same. –  Feb 11 '21 at 16:50
  • Related useful post: https://stackoverflow.com/a/40039278/680068 – zx8754 Feb 11 '21 at 16:50

2 Answers2

3

We can use extract

library(tibble)
library(tidyr)
tibble(emails) %>%
    extract(emails, into = c("firstname", "lastname", "address"), 
      "^(\\w+)\\.([^@]+)@(.*)")

-output

# A tibble: 3 x 3
#  firstname lastname address
#  <chr>     <chr>    <chr>  
#1 fname1    lname1   bla.com
#2 fname2    lname2   bla.com
#3 fname3    lname3   bla.com

If we need a base R option, either use strsplit or read.csv

f1<- function(vec) {
       read.csv(text =  sub("^(\\w+)\\.([^@]+)@(.*)", "\\1,\\2,\\3", 
     vec), header = FALSE, col.names = c('firstname', 'lastname', 'address'))
}

f1(emails)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I'm supposed to create a function and not use libraries with additional pre-defined functions. The functions takes the email adress as an input and returns a dataframe. –  Feb 11 '21 at 16:44
  • Thanks akrun. Can you just give a direction to understand ` "^(\\w+)\\.([^@]+)@(.*)"`this part of your code. – TarJae Feb 11 '21 at 16:59
  • 1
    @TarJae It is capturing as a group (`(...)`) the first word (`\\w+`) followed by a `.` (escape it because . implies any character, then capture one or more character that ar e not the @, followed by @ and capture the rest of the character (`.*`). In the replacement,, we specify the backrefernece of those captured, followed by `,` so that `read.csv` can split it with `,` – akrun Feb 11 '21 at 17:02
1

Using basic R code:

str<-c("Firstname.Lastname@bla.com","Firstname2.Lastname2@bla2.com")
 
out_df<-NULL
out_df$first<-unlist(lapply(strsplit(sub("\\@.*", "", str),"[.]"), `[[`, 1))
out_df$last<-unlist(lapply(strsplit(sub("\\@.*", "", str),"[.]"), `[[`, 2))
out_df$domain<-sub('.*@', '', str)
data.frame(out_df)
       first      last   domain
1  Firstname  Lastname  bla.com
2 Firstname2 Lastname2 bla2.com

Here the function format:

f<-function(str)
{
first<-unlist(lapply(strsplit(sub("\\@.*", "", str),"[.]"), `[[`, 1))
last<-unlist(lapply(strsplit(sub("\\@.*", "", str),"[.]"), `[[`, 2))
domain<-sub('.*@', '', str)
return(data.frame(cbind(first,last,domain)))
}
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39