4

I have a newline delimited (i.e., each JSON object is confined to 1 line in the file):

{"name": "json1"}
{"name": "json2"}
{"name": "json3"}

In Python I can easily read it as follows (I must use the encoding encoding='cp850' to read my real data):

import json

objs = []
with open("testfile.json", encoding='cp850') as f:
    for line in f:
        objs.append(json.loads(line))

How can I do a similar trick in R?

At the end I want to get a data.frame:

library("jsonlite")
library("data.table")

d <- fromJSON("testfile.json", flatten=FALSE)
df <- as.data.frame(d)
Fluxy
  • 2,838
  • 6
  • 34
  • 63

3 Answers3

4

We can use stream_in from jsonlite

library(jsonlite)
out <- stream_in(file('testfile.json'))
out
#    name
#1 json1
#2 json2
#3 json3

str(out)
#'data.frame':  3 obs. of  1 variable:
#$ name: chr  "json1" "json2" "json3"
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can read & process the data to a proper format and then parse the JSON

jsonlite::fromJSON(sprintf('[%s]', paste(readLines('text.json', warn = FALSE), 
                                         collapse = ',')))

#    name
# 1 json1
# 2 json2
# 3 json3

(you can use one of the many alternatives as JSON package e.g.

  • jsonlite a more R-like package as it will mainly work with data frames
  • RJSONIO a more Python-ic package working mainly with lists

or yet another one)

niko
  • 5,253
  • 1
  • 12
  • 32
1

The fastest way to read in newline json is probably stream_in from ndjson. It uses underlying C++ libraries (although I think it is still single threaded). Still, I find it much faster than (the still very good) jsonlite library. As a bonus, the json is flattened by default.

library(ndjson)
out<- ndjson::stream_in(path='./testfile.json')
nate
  • 1,172
  • 1
  • 11
  • 26