2

I have a vector like below:

vector 
jdjss-jdhs--abc-bec-ndj
kdjska-kvjd-jfj-nej-ndjk
eknd-nend-neekd-nemd-nemdkd-nedke

How do I extract the last 3 values so that my result looks like below based on a - delimitor:

vector                              Col1     Col2    Col3
jdjss-jdhs--abc-bec-ndj              abc      bec     ndj   
kdjska-kvjd-jfj-nej-ndjk             jfj      nej    ndjk
eknd-nend-neekd-nemd-nemdkd-nedke   nemd   nemdkd   nedke

I've attemped to use sub and the qdap package but no luck.

sub( "(^[^-]+[-][^-]+)(.+$)", "\\2", df$vector)
qdap::char2end(df$vector, "-", 3)

Not sure how to go about doing this.

pogibas
  • 27,303
  • 19
  • 84
  • 117
nak5120
  • 4,089
  • 4
  • 35
  • 94

4 Answers4

3

You may use tidyr::extract:

library(tidyr)
vector <- c("jdjss-jdhs--abc-bec-ndj", "kdjska-kvjd-jfj-nej-ndjk", "eknd-nend-neekd-nemd-nemdkd-nedke")
df <- data.frame(vector)
tidyr::extract(df, vector, into = c("Col1", "Col2", "Col3"), "([^-]*)-([^-]*)-([^-]*)$", remove=FALSE)

                             vector Col1   Col2  Col3
1           jdjss-jdhs--abc-bec-ndj  abc    bec   ndj
2          kdjska-kvjd-jfj-nej-ndjk  jfj    nej  ndjk
3 eknd-nend-neekd-nemd-nemdkd-nedke nemd nemdkd nedke

The ([^-]*)-([^-]*)-([^-]*)$ pattern matches:

  • ([^-]*) - Group 1 ('Col1'): 0+ chars other than -
  • - - a hyphen
  • ([^-]*) - Group 2 ('Col2'): 0+ chars other than -
  • - - a hyphen
  • ([^-]*) - Group 3 ('Col3'): 0+ chars other than -
  • $ - end of string

Set remove=FALSE in order to keep the original column.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

You can use strsplit from base.


    x <- "eknd-nend-neekd-nemd-nemdkd-nedke"

    lastElements <- function(x, last = 3){
      strLength <- length(strsplit(x, "-")[[1]])
      start <- strLength - (last - 1)
      strsplit(x, "-")[[1]][start:strLength]
    }

    > lastElements(x)
    [1] "nemd"   "nemdkd" "nedke" 

Chris Karnes
  • 21
  • 1
  • 2
1

You can simply split string by - using strsplit and extract last n elements:

df <- data.frame(vector = c(
    "jdjss-jdhs--abc-bec-ndj",
    "kdjska-kvjd-jfj-nej-ndjk",
    "eknd-nend-neekd-nemd-nemdkd-nedke"),
    stringsAsFactors = FALSE
)

cbind(df, t(sapply(strsplit(df$vector, "-"), tail, 3)))

                             vector    1      2     3
1           jdjss-jdhs--abc-bec-ndj  abc    bec   ndj
2          kdjska-kvjd-jfj-nej-ndjk  jfj    nej  ndjk
3 eknd-nend-neekd-nemd-nemdkd-nedke nemd nemdkd nedke
pogibas
  • 27,303
  • 19
  • 84
  • 117
0

strcapture, as a base R corollary to the tidyr extract answer from Wiktor:

strcapture("([^-]*)-([^-]*)-([^-]*)$", df$vector, proto=list(Col1="",Col2="",Col3=""))
#  Col1   Col2  Col3
#1  abc    bec   ndj
#2  jfj    nej  ndjk
#3 nemd nemdkd nedke
thelatemail
  • 91,185
  • 12
  • 128
  • 188