Parsing brackets and quotes

Question

I have a vector of strings and I would like to parse it. However, the brackets in combinations with quotes make this quite complicated. I would like to solve this preferably with stringr (not a requirement)

x = c("[\"DER001_A375_96H:TRCN0000052583:-666\"]", "[\"TRCN0000052583\"]", "[\"AAK1\",\"AARS\"]", "[\"A375\"]", "-6.7389873 ... 4.6063291") 

> x
[1] "[\"DER001_A375_96H:TRCN0000052583:-666\"]" "[\"TRCN0000052583\"]"                     
[3] "[\"AAK1\",\"AARS\"]"                       "[\"A375\"]"                               
[5] "-6.7389873 ... 4.6063291"

Expected result:

DER001_A375_96H:TRCN0000052583:-666
TRCN0000052583
AAK1
AARS
A375
6.7389873
4.6063291

Why does the data look like this? Did it start off as JSON data at one point? Are you sure you can't generate cleaner data further upstream in your pipeline? This seems like a real mess. — MrFlick, Jun 07 '19 at 14:59
It is an output from Shiny which I cannot change. See here: https://stackoverflow.com/questions/52858889/extract-filters-from-r-shiny-datatable/ — MrNetherlands, Jun 07 '19 at 15:13

G. Grothendieck · Accepted Answer · 2019-06-08T15:33:09.383

Replace each occurrence of ... with comma and remove all occurrences of square brackets. (Note that the [...] defines a character class and if the first character in the class is ] then it is regarded as part of the class and is not regarded to be the terminating ].) Finally, read it in using scan. No packages are used.

scan(text = gsub('[][]', '', gsub(" ... ", ",", x, fixed = TRUE)), 
  sep = ",", what = "", quiet = TRUE)

giving:

[1] "DER001_A375_96H:TRCN0000052583:-666" "TRCN0000052583"                     
[3] "AAK1"                                "AARS"                               
[5] "A375"                                "-6.7389873"                         
[7] "4.6063291"

score 1 · Answer 2 · answered Jun 07 '19 at 15:23

With help of SO (for parsing string) and http://edrub.in/CheatSheets/cheatSheetStringr.pdf :

x = c("[\"DER001_A375_96H:TRCN0000052583:-666\"]", 
      "[\"TRCN0000052583\"]", "[\"AAK1\",\"AARS\"]", 
      "[\"A375\"]", "-6.7389873 ... 4.6063291") 
library("dplyr", quietly = TRUE, warn.conflicts = FALSE)
x1 <- x %>% 
        stringr::str_remove_all(pattern = "\"" ) %>% 
        stringr::str_remove_all(pattern = "\\[" ) %>% 
        stringr::str_remove_all(pattern = "\\]" )

x2 <- unlist ( strsplit(x1, split = ",") )
x3 <- unlist ( strsplit(x2, split = "\\.\\.\\.") )
x3
#> [1] "DER001_A375_96H:TRCN0000052583:-666"
#> [2] "TRCN0000052583"                     
#> [3] "AAK1"                               
#> [4] "AARS"                               
#> [5] "A375"                               
#> [6] "-6.7389873 "                        
#> [7] " 4.6063291"

^{Created on 2019-06-07 by the reprex package (v0.2.1)}

Parsing brackets and quotes

2 Answers2