0

I have 500+ .json files that I am trying to get a specific element out of. I cannot figure out why I cannot read more than one at a time..

This works:

library (jsonlite)
files<-list.files(‘~/JSON’)
file1<-fromJSON(readLines(‘~/JSON/file1.json),flatten=TRUE)
result<-as.data.frame(source=file1$element$subdata$data)

However, regardless of using different json packages (eg RJSONIO), I cannot apply this to the entire contents of files. The error I continue to get is...

attempt to run same code as function over all contents in file list

for (i in files) { fromJSON(readLines(i),flatten = TRUE) as.data.frame(i)$element$subdata$data}

My goal is to loop through all 500+ and extract the data and its contents. Specifically if the file has the element ‘subdata$data’, i want to extract the list and put them all in a dataframe.

Note: files are being read as ASCII (Windows OS). This does bot have a negative effect on single extractions but for the loop i get ‘invalid character bytes’

Update 1/25/2019

Ran the following but returned errors...

files<-list.files('~/JSON')
out<-lapply(files,function (fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})

Error in file(i): object 'i' not found

Also updated function, this time with UTF* errors...

    files<-list.files('~/JSON')
out<-lapply(files,function (i,fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})

Error in parse_con(txt,bigint_as_char):
 lexical error: invalid bytes in UTF8 string. (right here)------^

Latest Update Think I found out a solution to the crazy 'bytes' problem. When I run readLines on the .json file, I can then apply fromJSON),

e.x.

json<-readLines('~/JSON')
jsonread<-fromJSON(json)
jsondf<-as.data.frame(jsonread$element$subdata$data)
#returns a dataframe with the correct information

Problem is, I cannot apply readLines to all the files within the JSON folder (PATH). If I can get help with that, I think I can run...

files<-list.files('~/JSON')
for (i in files){
a<-readLines(i)
o<-fromJSON(file(a),flatten=TRUE)
as.data.frame(i)$element$subdata}

Needed Steps

apply readLines to all 500 .json files in JSON folder apply fromJSON to files from step.1 create a data.frame that returns entries if list (fromJSON) contains $element$subdata$data.

Thoughts?

Solution (Workaround?)

Unfortunately, the fromJSON still runs in to trouble with the .json files. My guess is that my GET method (httr) is unable to wait/delay and load the 'pretty print' and thus is grabbing the raw .json which in-turn is giving odd characters and as a result giving the ubiquitous '------^' error. Nevertheless, I was able to put together a solution, please see below. I want to post it for future folks that may have the same problem with the .json files not working nicely with any R json package.

#keeping the same 'files' variable as earlier
raw_data<-lapply(files,readLines)
dat<-do.call(rbind,raw_data)
dat2<-as.data.frame(dat,stringsasFactors=FALSE)
#check to see json contents were read-in
dat2[1,1]

library(tidyr)
dat3<-separate_rows(dat2,sep='')
x<-unlist(raw_data)
x<-gsub('[[:punct:]]', ' ',x)

#Identify elements wanted in original .json and apply regex
y<-regmatches(x,regexc('.*SubElement2 *(.*?) *Text.*',x))
OctoCatKnows
  • 399
  • 3
  • 17

1 Answers1

2
  1. for loops never return anything, so you must save all valuable data yourself.
  2. You call as.data.frame(i) which is creating a frame with exactly one element, the filename, probably not what you want to keep.
  3. (Minor) Use fromJSON(file(i),...).
  4. Since you want to capture these into one frame, I suggest something along the lines of:

    out <- lapply(files, function(fn) {
      o <- fromJSON(file(fn), flatten = TRUE)
      as.data.frame(o)$element$subdata$data
    })
    allout <- do.call(rbind.data.frame, out)
    ### alternatives:
    allout <- dplyr::bind_rows(out)
    allout <- data.table::rbindlist(out)
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Appreciate the response. I ill give this a try! BTW, any ideas on why .json (fromJSON) is giving errors on a geoup action but not when called individual? – OctoCatKnows Jan 25 '19 at 00:36
  • I don't know what you mean by "group action". – r2evans Jan 25 '19 at 01:19
  • Sorry, as in when I attempt to run ‘fromJSON’ and iterate through the files list. Error returns due to invalid character bytes-usually associated with not being UTF-8. – OctoCatKnows Jan 25 '19 at 02:36
  • Is that with `fromJSON(file(f))` or `fromJSON(readLines(f))`? I don't know if it makes a difference. An error like that is often unrelated to doing it individually or within some form of loop (and possibly a new question). – r2evans Jan 25 '19 at 04:14
  • hello! So I am trying it this morning but continue to get 'object (i)' not found when running 'o<-fromJSON(file(i),flatten=TRUE) – OctoCatKnows Jan 25 '19 at 12:25
  • Oops, see my edits (two of them!). Apparently I should have added one more shot of espresso before answering the first time. Sorry about that. – r2evans Jan 25 '19 at 17:26
  • 1
    your updates did not work - but, it helped me to get to a solution :) ! My guess is that your solution (and my earlier tries) would work if the .json files were reading in as pretty print vice raw --which is the error both my earlier attempts and your suggestions were being hung-up on. I am going to add to my original post, please take a look! – OctoCatKnows Jan 30 '19 at 11:34
  • Glad you solved it ... not sure about why `httr` is dorking the content, but at least you have it working. – r2evans Jan 30 '19 at 16:18