0

I have a vector that includes the following types of data in R(more than just the two here):

df <- c("04 IRB/IEC and other Approvals\04.01 IRB/IEC Trial Approvals\04.01.02 IRB/IEC Approval",
 "01 Trial Management\01.01 Trial Oversight\01.01.02 Trial Management Plan")

All observations have the same structure with two backslashes. I want to extract the 8 characters immediately following the last backslash (or the numerical values including the periods). Here is an example of what I would want in R (I've been trying to use stringr):

df2 <- c("04.01.02", "01.01.02")

If anyone is familiar with the DIA TMF reference model, I want the zone/section/artifact number from the DF.

Thank you!

Mr. Biggums
  • 197
  • 8
  • Sorry, that was an error, just edited it! – Mr. Biggums Nov 10 '21 at 23:29
  • 3
    first having a backslash on `"\04"` indicates that `"\04"` is just 1 CHARACTER. in that case you cannot separate the "04" from the backslash. Therefore unless some magic unheard of is used, you cannot get what you are looking for – Onyambu Nov 10 '21 at 23:30
  • 1
    Why not extract the pattern `"\\d\\d\\.\\d\\d\\.\\d\\d"` with `str_extract`? – Allan Cameron Nov 10 '21 at 23:33
  • another variation `sub(".*\\\\(\\d.*\\d).*", "\\1", df)` – user20650 Nov 10 '21 at 23:39
  • @user20650 That code cannot work as it assumes that `"\04"` is a 3 character string. Its just one character – Onyambu Nov 10 '21 at 23:40
  • @Onyambu ; i suppose i was assuming this was a transcription error and added the double back-slashes, "\\04 ... " . – user20650 Nov 10 '21 at 23:41
  • @user20650 why would you assume it to be an error? the vector given above runs without any error. Thats a correct character. Its just like having `\n`, or even `\a`. Anyway OP will need to claify that – Onyambu Nov 10 '21 at 23:46
  • 1
    @user20650 I also assumed that. Certainly makes it harder but not impossible. (i.e. char -> raw -> numeric -> add 48 to values under 10 -> raw -> character -> extract pattern) – Allan Cameron Nov 10 '21 at 23:49
  • so the str extreact option worked that Allan gave because once I loaded the 7000 data set in using readr, it added an extra backslash to all the backslashes – Mr. Biggums Nov 10 '21 at 23:59

2 Answers2

3

We may need

library(stringi)
library(stringr)
stri_extract_last_regex(str_replace_all(df, setNames(c(" 04", " 01"),
      c("\004", "\001"))), "\\d{2}\\.\\d{2}\\.\\d{2}")
[1] "04.01.02" "01.01.02"
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Instead of splitting on the backslash, if you only want the numbers separated by periods, you could do something like:

stringr::str_extract(df, "\\d\\d\\.\\d\\d\\.\\d\\d")
#> [1] "04.01.02" "01.01.02"

Data used

df <- c("04 IRB/IEC and other Approvals\\04.01 IRB/IEC Trial Approvals\\04.01.02 IRB/IEC Approval",
 "01 Trial Management\01.01 Trial Oversight\\01.01.02 Trial Management Plan")
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • In this case you added the backslash – Onyambu Nov 10 '21 at 23:38
  • How would I add the backlash to all that data? B/c I have about 7000 of these – Mr. Biggums Nov 10 '21 at 23:39
  • @Mr.Biggums I thought that you had made a transcription error in your example. So it really is a literal "'\04" octal character? What happens when you do `cat(df)`? – Allan Cameron Nov 10 '21 at 23:42
  • 1
    This happens Allan: 04 IRB/IEC and other Approvals.01 IRB/IEC Trial Approvals.01.02 IRB/IEC Approval 01 Trial Management.01 Trial Oversight.01.02 Trial Management Plan – Mr. Biggums Nov 10 '21 at 23:43
  • @Allan why would you assumce "\04" to be an octal character? In my computer, its a diamond, while "\03" is hearts – Onyambu Nov 10 '21 at 23:53
  • @Onyambu I mean octal as in the octal representation of a byte within a string - see for example https://stackoverflow.com/questions/11815076/octal-representation-inside-a-string-in-c . – Allan Cameron Nov 10 '21 at 23:57
  • anyway, from the comments it appears that `readr` has read the strings in with the backslashes escaped, so the above solution should work. – Allan Cameron Nov 11 '21 at 00:00