0

The NZ companies register offers a json file containing all publicly available business info. This file comes in at a whopping 40gb, but there is also a smaller json file (~250mb) containing data on unincorporated entities (sole traders etc). As a warm up excercise I thought i'd have a go importing it into R to get an idea of size, scalability and computational reqs.

I'm having alot of trouble importing the smaller json file into R. I've tried jsonlite, RJSONIO, rjson but it appears that the file is written in an 'unorthodox' json format, hence the standard 'fromJSON' commands are falling over. Below is a portion of the file (2 entities) which i've been trying to import into R: test.json

library(jsonlite)
json <- fromJSON("test.json", flatten=TRUE)

Error in parse_con(txt, bigint_as_char) : 
   parse error: invalid object key (must be a string)
      zbn": [{          "entity": [{            {               "australianBusinessNumbe
                 (right here) ------^

NB: JSONlint doesn't seem to think the file is a valied JSON file

My thought is that I may need to use stream_in() or readLines() but I am no very proficient with these functions. Any help or insight greatly appreciated. Cheers

Community
  • 1
  • 1
  • If you get rid of one set of braces that come in (the one pointed to by the error) the jsonlite::fromJSON works fine. I wonder why this is formatted that way... I guess you can paste(readLines('test.json'), collapse = "") and using regular expressions, kill that particular ```{}```? – Kim Oct 13 '17 at 03:02
  • I hate to say this (b/c XML is evil) but they offer XML versions, too, correct? – hrbrmstr Oct 13 '17 at 11:52
  • Yes they offer XML but the XML file is 9GB so I'm finding it hard to work with that - thought i'd start with the small json file – Oliver Mills Oct 18 '17 at 00:09

0 Answers0