0
library(jsonlite)
paths <- list.files(
  pattern="data.json",
  full.names=TRUE,
  recursive=TRUE
)
colNames = c("BillType",
             "Congress",
             "IntroducedAt",
             "OfficialTitle",
             "Number",
             "Status",
             "SubjectsTopTerm",
             "UpdatedAt")
trimData <- function(x) {
  a <- list(x$bill_type,
            x$congress,
            x$introduced_at,
            x$official_title,
            x$number,
            x$status,
            x$subjects_top_term,
            x$updated_at)
  result <- as.data.frame(a)
  return(result)
}
rawData <- do.call(
    "rbind",
    lapply(paths, function(x) fromJSON(txt = x, simplifyDataFrame = TRUE))
)
prunedData <- do.call(
    "rbind",
    lapply(rawData, function(x) trimData(x))
)
colnames(test) <- colNames
write.csv(prunedData, "test3.csv")

My goal with this script is to take the dataframe-ified JSON data and turn it into a slimmer data frame for CSV output. The rawData variable ends up having roughly ~100 columns. When I execute this script in RStudio, I get the following error:

> prunedData <- do.call("rbind", lapply(rawData, function(x) trimData(x)))
Error in data.frame(NULL, NULL, NULL, NULL, NULL, c(NA, "PASS_OVER:HOUSE",  : 
  arguments imply differing number of rows: 0, 4

I'm not much of an expert in declarative languages like R and SQL so, if you could dumb this down for me, it would go a long way!

Caleb Faruki
  • 2,577
  • 3
  • 30
  • 54

1 Answers1

0

Consider this JSON to data frame migration approach with a nested do.call() and lapply() commands. Outer do.call row binds data across files, inner do.call row binds json data within each file. The paste() collapses list data into one element, removing the EOF should your json files be pretty-printed and not compacted all on one line.

library(jsonlite)

paths <- list.files(pattern="data.json", full.names=TRUE, recursive=TRUE)
colNames = c("BillType", "Congress", "IntroducedAt", "OfficialTitle",
             "Number", "Status", "SubjectsTopTerm", "UpdatedAt")

rawData <- do.call(rbind,
                   lapply(paths, 
                          function(x)
                          do.call(rbind, 
                                  lapply(paste(readLines(x, warn=FALSE),
                                               collapse=""), 
                                         jsonlite::fromJSON)
                          )
                   )
           )

# TRIM TO NEEDED COLUMNS
prunedData <- rawdata[colNames]
Parfait
  • 104,375
  • 17
  • 94
  • 125