0

I am trying to work with a nested JSON format similar to that below. What I ultimately need is for the data frame to have only two rows of data, one for John and one for Sam, with the other data in a format like below. So this particular data frame would have 2 rows and 7 columns.

Name  RD1  RD2  Hours1  Hours2  Billable1  Billable2
John
Sam

How could this be accomplished? Thank you in advance!

Here's the code:

library(jsonlite)
options(stringsAsFactors = FALSE)
    rawData <- "document.txt"
    processedData <- fromJSON(rawData, flatten = TRUE)
    processedData <- processedData[, c("name", "records")]
    unnestedJSON <- unnest(processedData, records)

document.txt contains this information:

[
{
  "name": "John",
  "records": [
     {
        "reportDate": "2018-07-20",
        "hours": 204,
        "billable": 32844
     },
     {
        "reportDate": "2018-03-25",
        "hours": 234,
        "billable": 37715
     }
  ]
},
{
  "name": "Sam",
  "records": [
     {
        "reportDate": "2018-06-18",
        "hours": 187,
        "billable": 13883
     },
     {
        "reportDate": "2018-04-02",
        "hours": 176,
        "billable": 13467
     }
  ]
}
]
greg42627
  • 79
  • 6

2 Answers2

1

Do you mean something like this?

jsonlite::fromJSON(s, flatten=TRUE) %>%
  tidyr::unnest()
#   name reportDate hours billable
# 1 John 2018-07-20   204    32844
# 2 John 2018-03-25   234    37715
# 3  Sam 2018-06-18   187    13883
# 4  Sam 2018-06-19   188    13884
# 5  Sam 2018-06-20   189    13885
# 6  Sam 2018-04-02   176    13467

I've had difficult doing this as straight data.table in a generic manner (not knowing columns beforehand), see for reference https://stackoverflow.com/a/34693087/3358272 (does not work here) and https://github.com/Rdatatable/data.table/issues/2146.


Supporting data, with a few entries added:

s <- '[
{
  "name": "John",
  "records": [
     {
        "reportDate": "2018-07-20",
        "hours": 204,
        "billable": 32844
     },
     {
        "reportDate": "2018-03-25",
        "hours": 234,
        "billable": 37715
     }
  ]
},
{
  "name": "Sam",
  "records": [
     {
        "reportDate": "2018-06-18",
        "hours": 187,
        "billable": 13883
     },
     {
        "reportDate": "2018-06-19",
        "hours": 188,
        "billable": 13884
     },
     {
        "reportDate": "2018-06-20",
        "hours": 189,
        "billable": 13885
     },
     {
        "reportDate": "2018-04-02",
        "hours": 176,
        "billable": 13467
     }
  ]
}
]'
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

You just need to aggregate data to the name level (with assumption that each name is unique).

Here is an option with data.table:

library(data.table)
dt <- data.table(unnestedJSON)
dt[, .(RD1 = first(reportDate), 
       RD2 = last(reportDate),  
       Hours1 = first(hours)
       Hours2 = last(hours)
       Billable1 =first(billable) 
       Billable2 = last(billable)), by = name]

If your names are not unique, you will need to create an ID column first and aggregate by ID.

Note: this should work with your example with two values per name. If json can have more values per name, it might not meet your requirements.

To support over two values per name use dcast:

library(data.table)
dt <- data.table(unnestedJSON)
dt[, rown := 1:.N, by = name]
res <- dcast(
  dt, 
  formula = name ~ rown, 
  value.var = c("reportDate", "hours", "billable"), fun.aggregate = mean)
Bulat
  • 6,869
  • 1
  • 29
  • 52