Suppose I have the following JSON data:
{ "_id" : { "$oid" : "string" }, "titulo" : "string", "id_cv" : 1132, "textos" : [ { "fecha" : { "$date" : 1217376000000 }, "estado" : "string", "texto" : "string", "source_url" : "string" } ] }
{ "_id" : { "$oid" : "string" }, "titulo" : "string", "autores" : ",\"string\",\"string\",\"string\",\"string",5", "id_cv" : 1138, "textos" : [ { "fecha" : { "$date" : 1217548800000 }, "estado" : "string", "texto" : "string", "source_url" : "string" } ] }
I am attempting to import the JSON data in to R and transform it in to ultimately an R Data Frame.
Suppose I have the following script in R:
library("rjson")
json_file <- "/Users/usr/file/json_data.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
data = unlist(json_data)
title=data[names(data)=="titulo"]
print(title)
text=data[names(data)=="textos.texto"]
print(text)
url=data[names(data)=="textos.source_url"]
print(url)
When I run this script the JSON data only yields a data frame containing the first line of the JSON data file. I have approximately 200 lines. One of the issues that I am aware of is that JavaScript does not 'allow' multi-line strings. I have attempted to cope with this in various ways:
- Add '"' between each 'line' of data.
- Add '"' to the end of each 'line' of data.
- Add "\" between each 'line' of data.
- Add "\" to the end of each 'line' of data.
- Convert all multiple lines in to one line (replace "\n" with "\n")
All of the above have been attempted using regular expressions.
My question is: How do I manipulate the JSON data so that all the 'lines' of the data are being read in to R so that I may unlist them and construct the appropriate data frame with columns equal to 'title','text','url' and rows equal to the 'lines' from the JSON data?
I have attempted this using both the RJSON & RJSONIO libraries in R, but I am ambivalent about which one I use at the moment since I believe ultimately that the issue is with the formatting of the JSON data itself