1

I'm trying to read a (to me) weirdly formatted JSONstream into an R dataframe efficiently. It's for a personal project to learn more R.

The json I'm talking about is: https://livetiming.formula1.com/static/2021/2021-03-28_Bahrain_Grand_Prix/2021-03-28_Race/TimingAppData.jsonStream

It's formatted like an ndjson but with a timestamp outside of each json entry.

I can't manage to efficiently read this into a dataframe. Currently I'm taking the jsonSTREAM as text, using regex to remove the timestamp and splitting the resulting string into a character vector using it's linebreaks "\r\n". Then I can finally use ndjson::flatten to get it into a dataframe.

The above is slow and I feel like I'm missing something obvious. Is there a better way to do this?

My code now is as follows:

library(httr)
library(ndjson)

url <- "https://livetiming.formula1.com/static/2021/2021-03-28_Bahrain_Grand_Prix/2021-03-28_Race/TimingAppData.jsonStream"
response <- content(GET(url), "text")
gsubbed_resp <- gsub("\\d{2}:\\d{2}:\\d{2}.\\d{3}", "", response)
resp_chr_vector <- unlist(strsplit(gsubbed_resp, "\r\n"))
result <- ndjson::flatten(resp_chr_vector)

The resulting dataframe is:

str(result)
Classes ‘data.table’ and 'data.frame':  1084 obs. of  536 variables:
Keipi
  • 131
  • 7

0 Answers0