I'm pulling soccer data through an API - the resulting JSON is returned as a list; dput
example below:
list(list(id = 10332894L, league_id = 8L, season_id = 12962L,
aggregate_id = NULL, venue_id = 201L, localteam_id = 51L,
visitorteam_id = 27L, weather_report = list(code = "drizzle",
temperature = list(temp = 53.92, unit = "fahrenheit"),
clouds = "90%", humidity = "87%", wind = list(speed = "12.75 m/s",
degree = 200L)), attendance = 25098L, leg = "1/1",
deleted = FALSE, referee = list(data = list(id = 15267L,
common_name = "L. Probert", fullname = "Lee Probert",
firstname = "Lee", lastname = "Probert"))), list(id = 10332895L,
league_id = 8L, season_id = 12962L, aggregate_id = NULL,
venue_id = 340L, localteam_id = 251L, visitorteam_id = 78L,
weather_report = list(code = "drizzle", temperature = list(
temp = 50.07, unit = "fahrenheit"), clouds = "90%", humidity = "93%",
wind = list(speed = "6.93 m/s", degree = 160L)), attendance = 22973L,
leg = "1/1", deleted = FALSE, referee = list(data = list(
id = 15273L, common_name = "M. Oliver", fullname = "Michael Oliver",
firstname = "Michael", lastname = "Oliver"))))
I'm extracting using a for loop at the moment - the reprex shows 2 top level list items when there are hundreds in the full data. The main drawback of using a loop is that there are sometimes missing values which cause the loop to stop. I'd like to move this to purrr
but am struggling to extract 2nd level nested items using at_depth
or modify_depth
. There are also nests inside nests which really adds to the complexity.
The end-state should be a tidy data frame - from this data the df will only have 2 rows but will have many columns each representing an item, no matter where that item is nested in this list. If something's missing then it should be an NA
value.
The ideal scenario for a solution, even though it may be inelegant is that there's a data frame per level / nested item produced that can then be bound together later.
thanks.