0

I have a .txt file with this structure

section1#[{"p": "0.999834", "tag": "MA"},{"p": "1", "tag": "MO"},...etc...}]
section1#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"},...etc...}]
...
section2#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"},...etc...}]

I am trying to read it by using R with the commands

library(jsonlite)
data <- fromJSON("myfile.txt")

But I get this

Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) : 
  lexical error: invalid char in json text.
                                       section2#[{"p": "0.99
                     (right here) ------^

How can I read it even by splitting by sections?

pachadotdev
  • 3,345
  • 6
  • 33
  • 60

2 Answers2

4

Remove the prefix and bind the flattened JSON arrays together into a data frame:

raw_dat <- readLines(textConnection('section1#[{"p": "0.999834", "tag": "MA"},{"p": "1", "tag": "MO"}]
section1#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"}]
section2#[{"p": "0.9995", "tag": "NC"},{"p": "1", "tag": "FL"}]'))

library(stringi)
library(purrr)
library(jsonlite)

stri_replace_first_regex(raw_dat, "^section[[:digit:]]+#", "") %>% 
  map_df(fromJSON)
##          p tag
## 1 0.999834  MA
## 2        1  MO
## 3   0.9995  NC
## 4        1  FL
## 5   0.9995  NC
## 6        1  FL
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Thank you so much. I didn't create that file, it comes from public data. IT was exactly what I was afraid of, the json structure was not valid. – pachadotdev Sep 09 '16 at 16:37
1

Remove section# from each line. Then your .txt will have a 2D array with JSON objects at each index. You can access elements by accessing it as foo[0][0] being the first object of first line and foo[m][n] where m is the number of sections -1 and n is number of objects in each section -1

Prateek Gupta
  • 538
  • 2
  • 10
  • 1
    Actually it still won't be a valid JSON file if you remove the prefix. A JSON file needs to have single root element. You'd need to have a wrapping `[ ]` and have `,` at the end of each line. It's more of a [JSON line](http://jsonlines.org/) format. – MrFlick Sep 08 '16 at 21:46
  • @MrFlick It won't be a JSON file, but it would be a 2D array. – Prateek Gupta Sep 08 '16 at 21:48
  • Well, it's not a JSON array. Are you talking about reading in the data with `readLines()` or something? How are you paring this in R to make that a 2D array? – MrFlick Sep 08 '16 at 21:49
  • @MrFlick I am looking into it, will update my answer soon. – Prateek Gupta Sep 08 '16 at 22:00
  • @Prateek Gupta. Thank you so much. I didn't create that file, it comes from public data. IT was exactly what I was afraid of, the json structure was not valid. – pachadotdev Sep 09 '16 at 16:37