I have a dataset with multiple levels:
- Categories (e.g., "Countries")
- Countries (e.g., "USA")
- Cities (e.g., "New York")
- Counties (e.g., "Manhattan")
- Places (e.g., "Times Square")
Each row (except for LVL 1 entries) is linked to a parent a level above.
For example: Times Square->Manhatten->New York->USA->Countries
My question: how to sort this dataset:
df2 <- structure(list(ID = c(3,6,9,11,12,19,411,50,77,83,105),
Parent = c(12,12,77,105,19,NA,3,41,19,77,19),
Level = c(3,3,3,3,2,1,4,5,2,3,2),
Name = c("New York","Boston","Oxford","Vancouver","USA","Countries",
"Manhattan","Times Square","UK","London","Canada")),
class = "data.frame",
row.names = c(NA, -11L))
into this:
df2 <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11),
Parent = c(NA,19,12,3,41,12,19,77,77,19,105),
Level = c(1,2,3,4,5,3,2,3,3,2,3),
Name = c("Countries","USA","New York","Manhattan","Times Square",
"Boston","UK","London","Oxford","Canada","Vancouver")),
class = "data.frame",
row.names = c(NA, -11L))
In df2
, the list is arranged according to the level first, but each linked sub-level is directly underneath.
I have tried several dyplr::arrange()
variants (e.g., arrange(Level, Parent)
) but all fail to account for the nested data. I think the solution might be a combination of group_by() and using arrange( ,.by_group = TRUE) as done here (R, dplyr - combination of group_by() and arrange() does not produce expected result?). Unfortunately, I couldn't solve it by myself.
Can anyone help? A tidyverse
/dplyr
solution would be preferred :-)