-1

I have extracted 6 months of email metadata and saved it as a csv file. The csv now only contains two columns (from and to email addresses). I want to build a graph where the vertices are those with whom I am communicating and whom communicated with me and the edges are created by a communications link labeling the edges by how many communications I had. What is the best approach for going about this?

TylerC
  • 19
  • 4

3 Answers3

0

One approach is to use Linked Data principles (although not advisable if you are short on time and don't have a background in Linked Data). Here's a possible approach:

  1. Depict each entity as a URI
  2. Use an existing ontology (such as foaf) to describe the data
  3. The data is transformed into Resource Description Framework (RDF)
  4. Use an RDF visualization tool.

Since RDF is inherently a graph, you will be able to visualize your data as well as extend it.

If you are unfamiliar with Linked Data, a way to view the garphs is using Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/). This approach is much simpler but lacks the benefits of semantic interoperability, provided you care about them in the first place.

kurious
  • 1,024
  • 10
  • 29
0

Cytoscape might be able to import your data in that format and build a network from it.

http://www.cytoscape.org/

N1ght
  • 53
  • 9
0

Your question (while mentioning Python) does not say what part or how much you want to do with Python. I will assume Python is a tool you know but that the main goal is to get the data visualized. In that case:

1) use Gephi network analysis tool - there are tools that can use your CSV file as-is and Gephi is one of them. in your case edge weights need to be preserved (= number of emails exchanged b/w 2 email addresses) which can be done using the "mixed" variation of Gephi's CSV format.

2) another option is to pre-process your CSV file (e.g. using Python), calculate edge weights (the number of e-mail between every 2 email addresses) and save it in any format you like. The result can be visualized in network analysis tools (such as Gephi) or directly in Python (e.g. using https://graph-tool.skewed.de).

Here's an example of an email network analysis project (though their graph does not show weights).

CaptSolo
  • 1,771
  • 1
  • 16
  • 17