I want to remove everything after the first ?
character in a URL. 3 of the 6 rows in my sample data contain the ?
character; the other 3 are OK as is.
structure(list(URL = c("/2015/08/10/five-great-fantasy-books-most-fans-dont-know-exist/",
"/2015/09/25/animated-dune-matt-rhodes-concept-art/", "/2015/09/09/the-dogs-of-athens-kendare-blake/?et_cid=34295599&et_rid=1476556397&linkid=http",
"/2015/06/16/spin-the-wheel-1-the-wheel-of-time-companion/comment-page-4/",
"/2015/06/29/excerpt-brandon-sanderson-shadows-of-self-prologue/?et_cid=34326143&et_rid=1724499137&linkid=http",
"/2015/08/12/milagroso-isabel-yap/?et_cid=34174778&et_rid=559408553&linkid=http"
), Pageviews = c(100L, 200L, 113L, 100L, 50L, 13L)), .Names = c("URL",
"Pageviews"), row.names = c(NA, -6L), class = "data.frame")
I tried:
df1$URL<-sub("?:.*$","",df1$URL)
and this seems to have no effect.
I also tried:
df1$URL<-sapply(str_split(df1$URL,"?"),"[",1)
and this generated an error message.
Third attempt:
df1$URL<-sapply(strsplit(df1$URL,"?"),"[",1)
removed everything from my URL field except a forward slash.