0

I have a large dataframe (250k rows) with one column being a json text. The json is large with multi elements. I would like to parse one or two elements of that json. I used jsonlite::fromJSON() but that seems inefficient because I would be parsing the whole text in order to get one element. A microbenchmark gives 50 ms per row. for this method. Then I found the jqr package that allows me to access one element, and while being fast (2ms per row) I think it can be faster (I might be wrong. I wrote a wrapper to allow the jqr::jq function be used in columns and that doesn't add too much overhead but I still think it's slow. I am wrong to assume this should be much faster?

getJson = function(json, jsonTrajectory){
  stopifnot(length(jsonTrajectory)==1)
  .getJson1 = function(json, jsonTrajectory){
    if(is.na(json)|json=="") return(NA)
    return(jqr::jq(json, jsonTrajectory))
  }
  jsonParsedVector = json %>% purrr::map(.f = ~.getJson1(.x, jsonTrajectory))
  return(jsonParsedVector)
}
Courvoisier
  • 904
  • 12
  • 26

0 Answers0