1

I'm trying to use the Riverplot package in R to make a Sankey diagram, but I'm getting an error message about the column names in the edges frame.

I'm installing the readr and riverplot packages and then doing this:

> my_data <- read_csv("~/RProjects/my_data.csv")
>
> edges = rep(my_data, col.names = c("N1","N2","Value"))
>
> nodes = data.frame(ID = unique(c(edges$N1, edges$N2)))
>
> river <- makeRiver(nodes, edges)
>
> return(plot(river))

But on the penultimate command setting up the riverplot object "river" I get this error:

Error in checkedges(edges, nodes$ID)
  edges must have the columns N1, N2 and Value

The original CSV already has these column headings. I'm not sure what I'm doing wrong. I'm a complete newbie to R, so please be patient if I'm missing the obvious!

dput on my CSV file looks like this:

structure(list(N1 = c("Cambridge", "Cambridge", "Cambridge", 
"Cambridge", "Cambridge", "South Cambs", "South Cambs", "South Cambs", 
"South Cambs", "South Cambs", "Rest of East", "Rest of East", 
"Rest of East", "Rest of East", "Rest of East", "Rest of UK", 
"Rest of UK", "Rest of UK", "Rest of UK", "Rest of UK", "Abroad", 
"Abroad", "Abroad", "Abroad", "Abroad"), N2 = c("Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad"), Value = c(106068L, 
1616L, 2779L, 13500L, 5670L, 2593L, 138263L, 2975L, 4742L, 1641L, 
2555L, 3433L, 0L, 0L, 0L, 6981L, 3802L, 0L, 0L, 0L, 5670L, 1641L, 
0L, 0L, 0L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-25L), .Names = c("N1", "N2", "Value"), spec = structure(list(
    cols = structure(list(N1 = structure(list(), class = c("collector_character", 
    "collector")), N2 = structure(list(), class = c("collector_character", 
    "collector")), Value = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("N1", "N2", "Value")), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))

str(edges) gives:

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   25 obs. of  3 variables:
 $ N1   : chr  "Cambridge" "Cambridge" "Cambridge" "Cambridge" ...
 $ N2   : chr  "Cambridge" "South Cambs" "Rest of East" "Rest of UK" ...
 $ Value: int  106068 1616 2779 13500 5670 2593 138263 2975 4742 1641 ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 3
  .. ..$ N1   : list()
  .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
  .. ..$ N2   : list()
  .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
  .. ..$ Value: list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  ..$ default: list()
  .. ..- attr(*, "class")= chr  "collector_guess" "collector"
  ..- attr(*, "class")= chr "col_spec"
www
  • 38,575
  • 12
  • 48
  • 84
String
  • 123
  • 1
  • 8

1 Answers1

0

I believe the problem is that you left out the required ID column, and thus confused the command.

edges = rep(my_data, col.names = c("N1","N2","Value"))
edges    <- data.frame(edges)
edges$ID <- 1:25

nodes = data.frame(ID = unique(c(edges$N1, edges$N2)))

river <- makeRiver(nodes, edges) 

The code above eliminates the error message. Note that it raises an unrelated warning, regarding repeated edge information.

Warning message:
In checkedges(edges, nodes$ID) :
  duplicated edge information, removing 10 edges
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Thank you. That adds a fourth column (ID) to edges. But then when I execute the makeRiver() line I get `Error in checkedges(edges, nodes$ID) : edges must not have the same IDs as nodes`. for some reason it was taking values from the new ID column in edges. I then tried setting the ID of the nodes frame by specifying them each in full, `nodes <- data.frame(ID = c("Cambridge","South Cambs","Rest of East","Rest of UK","Abroad"))` after which I then get the `duplicate edge information` error your mentioned. – String Dec 05 '16 at 00:58
  • @String Let's start a new question on that error if you're still having the problem. If you do start a new question put the link here so that I don't miss it. I'll check back in the morning (about 9 hours from now). – Hack-R Dec 05 '16 at 03:57
  • 1
    Great, thanks so much, you've solved the problem in the question. I think the subsequent "duplicated edge" problem was because I'd also not defined x and y for the Nodes. I'm having another go and making progress. Will post as separate question if I get stuck on something again. – String Dec 05 '16 at 06:45