I am pulling data directly from a Postgres database into R, where one of the columns in the Postgres table contains rows of JSON objects. I am trying to unpack the JSON objects and have them flatten into columns in an R dataframe, but so far, I'm getting mangled results.
Here's my code:
library(RPostgreSQL)
library(jsonlite)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host="xxx", dbname="xxx", user="xxx", password="xxx")
query="select column1, column2, json from dummy_table limit 2"
resultSet <- dbSendQuery(con, query)
rawData<-fetch(resultSet,n=-1)
postgresqlCloseConnection(con)
rawData$json
[1]"{\"id\":{\"publisherName\":\"pub1\",\"visitorId\":\"visitor1\",\"timestamp\":1234},\"startAt\":4567,\"endAt\":8910}"
[2]"{\"id\":{\"publisherName\":\"pub2\",\"visitorId\":\"visitor2\",\"timestamp\":2345},\"startAt\":678,\"endAt\":91011}"
unpacked<-fromJSON(rawData$json, simplifyDataFrame=FALSE)
unpacked
$id
$id$publisherName
[1] "pub1"
$id$visitorId
[1] "visitor1"
$id$timestamp
[1] 1234
$startAt
[1] 4567
$endAt
[1] 8910
As you can see, it only unpacked the first JSON object, and it left things quasi-nested (which is fine, but optimally, i would want all the data to live in one level in a dataframe).
I would want the data to look like this:
unpacked
id.publisherName id.visitorId id.timestamp startAt endAt
pub1 visitor1 1234 4567 8910
pub2 visitor2 2345 678 91011
EDIT: Adding the rawData dataframe:
rawData<-structure(list(
column1 = c("abcd", "efgh"
),
column2 = structure(c(123, 456), class = c("POSIXct",
"POSIXt"), tzone = ""),
json = c("{\"id\":{\"publisherName\":\"pub1\",\"visitorId\":\"visitor1\",\"timestamp\":1234},\"startAt\":4567,\"endAt\":8910}",
"{\"id\":{\"publisherName\":\"pub2\",\"visitorId\":\"visitor2\",\"timestamp\":2345},\"startAt\":678,\"endAt\":91011}"
))
, .Names = c("column1", "column2", "json"),
row.names = 1:2, class = "data.frame")
Here's what happens with the paste
function.
rawJSON <- paste("[", paste(rawData$json, collapse=","), "]")
rawJSON <- fromJSON(rawJSON, simplifyDataFrame=FALSE)
rawJSON
[[1]]
[[1]]$id
[[1]]$id$publisherName
[1] "pub1"
[[1]]$id$visitorId
[1] "visitor1"
[[1]]$id$timestamp
[1] 1234
[[1]]$startAt
[1] 4567
[[1]]$endAt
[1] 8910
[[2]]
[[2]]$id
[[2]]$id$publisherName
[1] "pub2"
[[2]]$id$visitorId
[1] "visitor2"
[[2]]$id$timestamp
[1] 2345
[[2]]$startAt
[1] 678
[[2]]$endAt
[1] 91011