1

I have a question on combining two csv.files - one is a and the other a . I want to combine them (graph.data.frame) and it should be a directed graph. I am using version 3.2.3/ version 0.99.879 and version 1.0.1.

A sample of this data looks like this:

libary(igraph)
nodes <- read.csv("data_n.csv", header = TRUE, row.names = 1, sep =";")
links <- read.csv("data_l.csv", header = TRUE, row.names = 1, sep =";")
head(nodes)
  Authors               Institution            status    gender
1 Jan Christoph Suntrup Käte Hamburger Kolleg  Post Doc    M
2 Renate Martinsen      Uni Duisburg-Essen     Prof        F
3 Bernd Ladwig          FU Berlin              Prof        M
4 Kathrin Morgenstern   Uni Regensburg       PhD student   F
5 Barbara Weber         Uni Regensburg         Prof        F

head(links)
From.Author1          To.Author2       relation   text.type 
1 Kathrin Morgenstern   Barbara Weber   undirect   Review  
2 Barbara Weber    Kathrin Morgenstern  undirect   Review
3 Andreas Busen        Paul Sörensen    undirect   other
4 Andreas Busen        Lisa Herzog      direct     other
5 Matthias Lemke    Gregor Wiedemann    undirect   other

As you can see, there are actors who have a mutual tie, some actors have a directed tie to another actor and there are also isolated nodes.

By combining the edgelist and nodelist with graph.data.frame, I get the following error:

g1 <- graph.data.frame(d=links, vertices = nodes, directed = T)
Error in graph.data.frame(links, vertices = nodes, directed = T) : Some
vertex names in edge list are not listed in vertex data frame

I checked for missing authors and I am now pretty sure that every actor from the nodelist is listed at least once in the edgelist. I also created loops for the isolated authors in the edgelist in case igraph cannot treat 'NA' in 'To.Author2' (and I would later use simplify to remove.loops). But these ideas have not solved the error.

I googled for solutions and found a suggestion here in which the answer relates to this (both at stackoverflow).

By following the instruction, the produced graph object takes the edge attribute but doesn't include the node attributes.

So, how can I solve this error respectively what am I doing wrong?

Looking forward to your suggestions and advices!
Any help is appreciated!

EDIT - providing sample of the nodelist and edgelist

dput(head(nodes, 15))
structure(list(no = 1:15, Authors = c("Jan Christoph Suntrup", 
"Renate Martinsen", "Bernd Ladwig", "Kathrin Morgenstern", "Barbara Weber", 
"Claudia Ritter", "Maik Herold", "Eva Marlene Hausteiner", "Andreas Busen", 
"Matthias Lemke", "Cord Schmelzle", "Daniel Jacob", "Oliver Flügel Martinsen", 
"Kari Palonen", "Thomas Schölderle"), Institution = c("Käte Hamburger Kolleg ", 
"Uni Duisburg-Essen", "FU Berlin", "Uni Regensburg", "Uni Regensburg", 
"Uni Kassel", "TU Dresden", "HU Berlin", "Uni Hamburg", "HSU Hamburg", 
"FU Berlin", "FU Berlin", "Uni Bielefeld", "Uni Jyväskylä", "Akademie Tutzing"
), status = c("Post Doc", "Prof", "Prof", "PhD student", "Prof", 
"Post Doc", "PhD student", "PhD student", "PhD student", "Post Doc", 
"Post Doc", "PhD student", "Post Doc", "Prof", "Post Doc"), gender = c("M", 
"F", "M", "F", "F", "F", "M", "F", "M", "M", "M", "M", "M", "M", 
"M")), .Names = c("no", "Authors", "Institution", "status", "gender"
), row.names = c(NA, 15L), class = "data.frame")

dput(head(links, 15))
structure(list(From.Author1 = c("Kathrin Morgenstern", "Barbara Weber", 
"Andreas Busen", "Andreas Busen", "Matthias Lemke", "Matthias Lemke", 
"Cord Schmelzle", "Cord Schmelzle", "Cord Schmelzle", "Cord Schmelzle", 
"Cord Schmelzle", "Cord Schmelzle", "Cord Schmelzle", "Cord Schmelzle", 
"Daniel Jacob"), To.Author2 = c("Barbara Weber", "Kathrin Morgenstern", 
"Paul Sörensen", "Lisa Herzog", "Gregor Wiedemann", "Andreas Niekler", 
"Eva Marlene Hausteiner", "Daniel Jacob", "Thorsten Thiel", "Ulrike Spohn", 
"Christian Volk", "Susanne Schmetkamp", "Maike Weißpflug", "Andreas Oldenbourg", 
"Eva Marlene Hausteiner"), relation = c("undirect", "undirect", 
"undirect", "undirect", "undirect", "undirect", "direct", "direct", 
"direct", "direct", "direct", "direct", "direct", "direct", "direct"
), text.type = c("Review", "Review", "other", "other", "other", 
"other", "Acknowledgement", "Acknowledgement", "Acknowledgement", 
"Acknowledgement", "Acknowledgement", "Acknowledgement", "Acknowledgement", 
"Acknowledgement", "Acknowledgement"), no = 1:15), .Names = c("From.Author1", 
"To.Author2", "relation", "text.type", "no"), row.names = c(NA, 
15L), class = "data.frame")
Stefan_W
  • 163
  • 3
  • 12
  • *I am now pretty sure that every actor from the nodelist is listed at least once in the edgelist.* : the error seems to be of the reverse, that is there are nodes in the edgelist not in the vertex list – user20650 Jun 25 '17 at 22:24
  • 1
    what does this return `ed = unique(as.character(unlist(links[, c("From.Author1", "To.Author2")]))) ; ve = unique(as.character(nodes$Authors)) ; table(ed %in% ve)` – user20650 Jun 25 '17 at 22:31
  • Okay, thanks for the advice. In the original datase, the result is `FALSE 5 TRUE 179` -> This means, I have missing nodes/authors in the edgelist? – Stefan_W Jun 25 '17 at 22:42
  • no, i think it means that authors in the edgelist are not in the node list (hence the error) – user20650 Jun 25 '17 at 22:42
  • Should be trivial to correct - just due a merge on the unique edgelist names with the nodelist names, and this will add these names to the nodelist, (but without further attributes) – user20650 Jun 25 '17 at 22:44
  • Thanks for using dput. I am deleting my answer. When I run the code from the deleted answer, it looks to me like every node in the edgelist has been added to the nodelist. I do not see any changes in the names, yet I get the error message that you got. My answer was not solving the problem. – G5W Jun 26 '17 at 00:30
  • 2
    Try: `ed = data.frame(Authors=unique(as.character(unlist(links[, c("From.Author1", "To.Author2")])))) ; nodesNew <- merge(ed, nodes, by="Authors", all=TRUE) g1 <- graph_from_data_frame(d=links, vertices = nodesNew, directed = T)` – user20650 Jun 26 '17 at 01:02
  • Great. It worked. So, just to understand what you suggested: using `merge`, you joined both data.frames and by setting `all=TRUE`, additional rows were added, right?! (However, when I look at the dataset now, no extra row appeared that wasn't there before and the number of rows is still 183). Excuse the followup question, I just try to understand what has happened that I can either avoid it or solve the problem by myself the next time :) – Stefan_W Jun 26 '17 at 10:51
  • 1
    @Stefan_W ; using the dput data in your question, if I run [the code from the earlier comment](https://stackoverflow.com/questions/44751028/combine-edgelist-and-nodelist-error-with-vertices-igraph?noredirect=1#comment76483332_44751028) I get `FALSE 10 TRUE 7`, and both datasets have 15 rows. When merging, by setting all=TRUE, the additional authors will be added. From the comment table, there are 10 authors in the edgelist not in the nodelist, so these will be added, resulting in the new nodelist (post merge) having 25 rows. – user20650 Jun 26 '17 at 12:35

0 Answers0