0

I have an object which is available through a URL in JSON format, and part of it I want to put into a dataframe format to analyse the data in R. I currently do this as follows:

# Read data in using fromJSON function
data <- RJSONIO::fromJSON('https://api-prod.footballindex.co.uk/football.allTradable24hrchanges?page=1&per_page=5000&sort=asc')

# The above is an example dataset to use for this question
# In my actual dataset there are fields which only exist in some elements in list so adding this onto example
# I want to handle these by returning in the end dataframe with NA if it doesn't exist
data[['items']][[1]]$newField <- 1

# It is only the data in the items field I am interested in
# Unlist each element to get all nested emelents within the lists in flat format
dataList <- lapply(data[['items']], unlist)

# Combine all elemnts of list together
dataDF <- dplyr::bind_rows(dataList)

# Convert into data.frame
dataDF <- data.frame(dataDF)

This works, however the bind_rows part takes a long time

> system.time(dataDF <- dplyr::bind_rows(dataList))
   user  system elapsed 
 42.195   0.000  42.216 

It feels like there must be a quicker way of doing this.

I am told that data.table::rbindlist is a faster alternative but using this gives me error message

> dataDF <- data.table::rbindlist(dataList)
Error in data.table::rbindlist(dataList) : 
  Item 1 of input is not a data.frame, data.table or list

Had a suggestion of using do.call(rbind... in an answer which runs and is fast, but when there are fields which are only in some elements it does not handle this properly. For example

dataDF2 <- data.frame(do.call(rbind, dataList))
> head(dataDF$country)
[1] "Côte d'Ivoire" "Italy"         "England"       "Scotland"      "Germany"       "France"       
> head(dataDF2$country)
[1] "Côte d'Ivoire" "1.65"          "1.62"          "FALSE"         "2.59"          "France"    

Thanks in advance for any help

user1165199
  • 6,351
  • 13
  • 44
  • 60

1 Answers1

1
data <- RJSONIO::fromJSON('https://api-prod.footballindex.co.uk/football.allTradable24hrchanges?page=1&per_page=5000&sort=asc')

system.time(dataDF <- as.data.frame(do.call(rbind, data[['items']])))

   user  system elapsed 
  0.007   0.000   0.006
Snipeskies
  • 258
  • 1
  • 5
  • Thanks for your help, and that works in the answer above (once I have also lapplied the unlist first), however that was an example and in my actual code there are some fields which exist in some elements in list but not in others. I want to fill these with NA where they don't exist which `dplyr::bind_rows` does. `do.call(rbind` doesn't do this properly. I will edit my question to reflect this – user1165199 Apr 28 '21 at 18:52
  • Seems like you could just do that in the lapply function – Snipeskies Apr 28 '21 at 19:45