2

I am trying to extract certain rows based on year from my dataset, furthermore I want to substring those rows matching the following conditions, for year 2017 I want to substring the the portion before the second '-' in the statment for eg: "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email" I would want only "All-SOV-OutR-Sov_Score-18Dec.Email" and for 2018 I want to remove the portion after the '.' for eg: "IVP Program Template.IVP Email Template" I want "IVP Program Template"

I have tried using

data$col <- sub(".*:", "", data$`Email Name`)
data$col2 <- substring(data$`Email Name`, regexpr(".", data$`Email Name`) + 1)

but none of it is working and returns the statements as is, also for filtering based on year I tried using the filter function filter(data, as.Date(data$First Activity (EDT)) = "2017") but it gives me syntax error

My dataset is like this: enter image description here

  • 4
    Please provide data with `dput(head(df,n))` not images or links. – NelsonGon Aug 18 '19 at 15:44
  • And please provide the expected output. – Christoph Aug 18 '19 at 17:41
  • here is my dataset: structure(list(`Email Name` = c("**IVP Program Template**.IVP Email Template", "17Q4-EMEA-All-SOV-Global Borrowing_5March18.Email", "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email", "17Q4-EMEA-All-SOV-Social Housing_8March1.Email", "17Q4-EMEA-Insurance-Personalised.PersonalisedFI&INS" ), `First Activity (EDT)` = structure(c(12/19/2017,3/5/2018,12/18/2017, 3/9/2018, 12/14/2017), class = c("POSIXct", "POSIXt" ), tzone = "UTC")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame")) – Pooja Gangurde Aug 19 '19 at 06:44
  • Please [edit] your question to add the data, rather than putting it into a comment. In addition to being very hard to read, the comment box interprets and formats the text you put in, which changes it and makes it impossible to reproduce – divibisan Aug 19 '19 at 18:48

1 Answers1

0

Here is the regex that should give you the desired result for 2017 values:

sub(".*?-.*?-", "", "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email")
# "All-SOV-OutR-Sov_Score-18Dec.Email"

The one for 2018 values:

sub("\\..*", "", "IVP Program Template.IVP Email Template")
# IVP Program Template

You can then apply the regex functions with ifelse:

library(lubridate)

data$email_adj <- NA

data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2017", sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)

data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2018", sub("\\..*", "", data$`Email Name`), data$email_adj)


If you want to filter by month instead of year use the month instaed of the year function (in the example I only selected months from April until July):

library(lubridate)

data$email_adj <- NA

data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)

data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub("\\..*", "", data$`Email Name`), data$email_adj)


Lisardo Erman
  • 148
  • 1
  • 8
  • hi, this works fine for year, but actually I have to do it for certain number of months, like april to july. so I am doing something like this data$newCol <- ifelse(data[months(data$`First Activity (EDT)`) %in% month.name[4:7], ], sub(".*?-.*?-", "", data$`Email Name`), data$`Email Name`) however I get an error, can you please help me correct this syntax – Pooja Gangurde Aug 19 '19 at 09:33
  • I adjusted the answer to account for this issue – Lisardo Erman Aug 19 '19 at 13:45