0

i think I have a simple question, but I did not get it. I have something like this:

df <- data.frame(identifier = c("9562231945200505501901190109-5405303
", "190109-8731478", "1901098260031", " 
.9..43675190109-3690341", "-1103214010200000190109-8841419", "-190109-5232506-.08001234-111",
                                "190109-2018362-","51770217835901218103304190109-9339765
"), true_values = c("190109-5405303","190109-8731478","190109-8260031","190109-3690341","190109-8841419",
                    "190109-5232506","190109-2018362","190109-9339765"))

I used the following function and it almost worked, but I do not know how too avoid the last dash.

I tried str_replace and sth else, but it did not work.

1 Answers1

0

You can try substr with paste after removing unwanted parts with gsub.

tt <- gsub("-\\..*", "", df$identifier)
tt <- gsub("[^0-9]", "", tt)
tt <- substring(tt, nchar(tt)-12)
paste0(substr(tt, 1, 6), "-", substring(tt, 7))
#[1] "190109-5405303" "190109-8731478" "190109-8260031" "190109-3690341"
#[5] "190109-8841419" "190109-5232506" "190109-2018362" "190109-9339765"
GKi
  • 37,245
  • 2
  • 26
  • 48