I have a large dataframe (250k rows) with one column being a json text. The json is large with multi elements. I would like to parse one or two elements of that json. I used jsonlite::fromJSON()
but that seems inefficient because I would be parsing the whole text in order to get one element. A microbenchmark gives 50 ms per row. for this method. Then I found the jqr
package that allows me to access one element, and while being fast (2ms per row) I think it can be faster (I might be wrong. I wrote a wrapper to allow the jqr::jq
function be used in columns and that doesn't add too much overhead but I still think it's slow. I am wrong to assume this should be much faster?
getJson = function(json, jsonTrajectory){
stopifnot(length(jsonTrajectory)==1)
.getJson1 = function(json, jsonTrajectory){
if(is.na(json)|json=="") return(NA)
return(jqr::jq(json, jsonTrajectory))
}
jsonParsedVector = json %>% purrr::map(.f = ~.getJson1(.x, jsonTrajectory))
return(jsonParsedVector)
}