1

I'm working on a project using data from Etsy's API. Specifically, I am looking at Etsy's category taxonomy. Each row in the dataframe represents one category node, with a nested data frame containing all of its child nodes. I am not sure how many levels of subcategories exist in each category, and the levels of subcategories is different for each category. Here's a screenshot of the dataframe as trying to use head() attempts to unnest all of the data and freezes rStudio. Each nested dataframe has the same columns as the dataframe it was nested in. When a node has no children, an empty list is stored in the children variable.

Does anyone have any suggestions on how I can unnest this data? I've tried using nested for loops and tidyr's unnest(), but that was adding new columns to the data frame for each nested dataframe. To be clear, the output I am looking for is one with the exact same columns as the nested dataframe, but with all of the categories stored in the inner dataframes appended to the end of it.

Is tidyr's unnest() the way to go for this problem? Does anyone have any suggestions of another method or package that I should look into?

1 Answers1

0

So without the data at hand, it is kind of hard to give a concrete working example, but what about the following:

Mapping over the rows of the dataframe and selecting each cell, unnesting the cell and appending it to the data using rbind (or rbind.fill to be more failsafe, in case the variables in one of the nested dataframes doesnt match).

So something like

df %>%
  split(nrow(.)) %>%
  purrr::map(~ .x %>% dplyr::select(children) %>% tidyr::unnest()) %>%
  plyr::rbind.fill(df, .)
Felix
  • 15
  • 3