I have extracted 6 months of email metadata and saved it as a csv file. The csv now only contains two columns (from and to email addresses). I want to build a graph where the vertices are those with whom I am communicating and whom communicated with me and the edges are created by a communications link labeling the edges by how many communications I had. What is the best approach for going about this?
-
I don't see a question here. – Scott Hunter Feb 22 '16 at 20:02
-
Please review this: https://stackoverflow.com/help/how-to-ask. Regarding your project, check out this library: https://networkx.github.io/examples.html – chishaku Feb 22 '16 at 20:04
3 Answers
One approach is to use Linked Data principles (although not advisable if you are short on time and don't have a background in Linked Data). Here's a possible approach:
- Depict each entity as a URI
- Use an existing ontology (such as foaf) to describe the data
- The data is transformed into Resource Description Framework (RDF)
- Use an RDF visualization tool.
Since RDF is inherently a graph, you will be able to visualize your data as well as extend it.
If you are unfamiliar with Linked Data, a way to view the garphs is using Pajek
(http://vlado.fmf.uni-lj.si/pub/networks/pajek/). This approach is much simpler but lacks the benefits of semantic interoperability, provided you care about them in the first place.

- 1,024
- 10
- 29
Cytoscape might be able to import your data in that format and build a network from it.

- 53
- 9
Your question (while mentioning Python) does not say what part or how much you want to do with Python. I will assume Python is a tool you know but that the main goal is to get the data visualized. In that case:
1) use Gephi network analysis tool - there are tools that can use your CSV file as-is and Gephi is one of them. in your case edge weights need to be preserved (= number of emails exchanged b/w 2 email addresses) which can be done using the "mixed" variation of Gephi's CSV format.
2) another option is to pre-process your CSV file (e.g. using Python), calculate edge weights (the number of e-mail between every 2 email addresses) and save it in any format you like. The result can be visualized in network analysis tools (such as Gephi) or directly in Python (e.g. using https://graph-tool.skewed.de).
Here's an example of an email network analysis project (though their graph does not show weights).

- 1,771
- 1
- 16
- 17