4

I have a list of twitter followers in text file that I want to import to iGraph.

Here's the sample of my list

393795446 18215973
393795446 582203919
393795446 190709835
393795446 1093090866
393795446 157780872
393795446 1580109739
393795446 3301748909
393795446 1536791610
393795446 106170345
393795446 9409752

And this is how I import it

from igraph import *
twitter_igraph = Graph.Read_Edgelist('twitter_edgelist.txt', directed=True)

But I get this error.

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-10-d808f2237fa8> in <module>()
----> 1 twitter_igraph = Graph.Read_Edgelist('twitter_edgelist.txt', directed=True)

InternalError: Error at type_indexededgelist.c:369: cannot add negative number of vertices, Invalid value

I'm not sure why it's saying something about negative number. I check the file and it doesn't have any negative number or id.

toy
  • 11,711
  • 24
  • 93
  • 176

1 Answers1

7

You need to use graph.Read_Ncol for this type of file format. Why your file doesn't conform to a typical "edgelist" format is beyond me. I've wondered this myself many times. I should also mention that I grabbed the answer from here. Tamàs seems to be the main igraph guy around here. I'm sure he can give a more detailed reason as to why you need to use Ncol as opposed to Edgelist.

This works for me.

from igraph import *
twitter_igraph = Graph.Read_Ncol('twitter_edgelist.txt', directed=True)

Personal Plug

This is a great example of where igraph's documentation could be improved.

For example: The only accompanying text with graph.Read_Edgelist() doc says...

Reads an edge list from a file and creates a graph based on it. Please note that the vertex indices are zero-based.

This doesn't really tell me anything when obviously there are nuances with how the file needs to be formatted. Saying what format this function expects the file to be in would save a lot of people their sanity.

Community
  • 1
  • 1
Austin A
  • 2,990
  • 6
  • 27
  • 42
  • 4
    The error message that igraph gives here is misleading; the problem is probably that one of the numbers in the file is larger than the maximum value of the integer type that igraph uses to represent vertex IDs. This causes an overflow and as a consequence, igraph "sees" a negative number as the vertex ID, and bails out. – Tamás Sep 11 '15 at 11:15
  • 6
    As for the difference between `Read_Edgelist` and `Read_Ncol`: this is a distinction that the underlying C library makes. An "edge list" is a list of pairs of integers, where each integer corresponds to the ID of some vertex. In igraph, the vertex IDs have to be consecutive in the range [0, |V|-1]. Therefore, reading a file like the one the poster has will create lots of isolated vertices since the vertex IDs in the file are not consecutive. That's why we have `Read_Ncol` - it will save the original IDs from the file in a vertex attribute named `name` and let the vertex IDs be consecutive. – Tamás Sep 11 '15 at 11:16
  • 4
    Also, thanks for the comments on the documentation of `python-igraph`; I am aware of it but unfortunately I don't have that much time to dedicate to the development of igraph ever since I left academia. Back in the old days, the C core and the Python interface was tied so closely to each other that I could simply assume that one can look up the more detailed documentation of the C core, but this is not the case any more. Pull requests are welcome - I'll happily merge any pull request that improves the documentation. – Tamás Sep 11 '15 at 11:17
  • Hey @Tamás, thank you so much for all these replies. It's great to see a creator so involved with users. I thought the exact same thing about the integer overflow, but I noticed weird behavior when I reduced Toy's edgelist to the first 2 rows. It seems that igraph creates a graph of 582203919 nodes and the 2 edges that are expected. You're second comment explains why this happens. I also was confused why there was integer overflow when python's maxint is much higher than any integer in the list. It seems like the C core is what is actually handling creating the edgelist, which I'm sure ... – Austin A Sep 12 '15 at 13:35
  • ... uses 8-bit integers and creates the negative integer. I, unfortunately don't know much C but I would love to contribute to the package. Thanks again for the explanation!! – Austin A Sep 12 '15 at 13:37