2

I have a line that look like this:

INSERT INTO `table_name` VALUES (1,'some','body','+'), (2,'once','told me','+'), (3,'the world','is gonna roll me','+'))

And I'm trying to parse it as a dataframe such as:

tibble::tribble(
    ~col1, ~col2, ~col3, ~col4,
    1, "some", "body", "+",
    2, "once", "told me", "+",
    3, "the world", "is gonna roll me", "+",
)
#> # A tibble: 3 × 4
#>    col1 col2      col3             col4 
#>   <dbl> <chr>     <chr>            <chr>
#> 1     1 some      body             +    
#> 2     2 once      told me          +    
#> 3     3 the world is gonna roll me +

In short, take the content between each parentheses set as a line and separate it by ,.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Alberson Miranda
  • 1,248
  • 7
  • 25

1 Answers1

1

Here's a quick and simple version - assumes that all lines always have 3 commas and that there aren't commas in the text itself but should get you started:

library(stringr)
library(tidyr)
v <- "INSERT INTO `table_name` VALUES (1,'some','body','+'), (2,'once','told me','+'), (3,'the world','is gonna roll me','+'))"
w <- data.frame(val=str_extract_all(v, "(?<=\\().*?(?=\\))")[[1]])
separate(w, val, into = paste0("col", 1:4), sep = ",")

which produces

  col1        col2               col3 col4
1    1      'some'             'body'  '+'
2    2      'once'          'told me'  '+'
3    3 'the world' 'is gonna roll me'  '+'
Dubukay
  • 1,764
  • 1
  • 8
  • 13