Pandas Dataframe to Network graph

Question

I have some CSV files with more than 100 lines, like this table, .

I am trying to create a graph from A to B, C, and D with the weight edge distance.

I am right now using pyvis, but the problem is the edge lengths are all the same.

import pandas as pd
from IPython.display import display, HTML
from pyvis.network import Network 

df = pd.read_csv('sample_data.csv')
got_net = Network(notebook=True, cdn_resources='in_line')

sources = df['from']
targets = df['to']
weights = df['weight']
edge_data = zip(sources, targets, weights)

for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]

    got_net.add_node(src, src, title=src)
    got_net.add_node(dst, dst, title=dst)
    got_net.add_edge(src, dst, label = "weight")


neighbor_map = got_net.get_adj_list()  

for node in got_net.nodes:
    node['title'] += ' Neighbors:' + ''.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

got_net.save_graph('graph.html')
display(HTML("graph.html"))

Please share your attempt and provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) — Tranbi, Aug 24 '23 at 09:24
@jonson FYI I have added a solution if you are still interested in one for this question — John Collins, Sep 02 '23 at 06:49

John Collins · Answer 1 · 2023-08-25T09:01:17.827

Don't forget to set the 'physics engine' of the network graph, and also use the `value` (not `label`) parameter when setting the weights of the edges

For example, consider a network visualization showing the frequency of bigrams in a famous piece of English literature, say Shakespeare's "To be or not to be" soliloquy from Hamlet:

from collections import defaultdict
import re

hamlet_speech = # [See link above]

shakespeare_letters = re.sub("[',.;:\-\—?\n ]", "", hamlet_speech.upper())

bigrams = [
    shakespeare_letters[i : i + 2]
    for i in range(len(shakespeare_letters) - 1)
]

freqs = defaultdict(int)

for xy in bigrams:
    freqs[xy] += 1

df = pd.DataFrame(
    [[*xy] + [w] for xy, w in freqs.items()], columns=["from", "to", "weight"]
)
df.sort_values(by="weight", inplace=True, ascending=False)
df = df[df.weight > 3]
df

gives:

    from to  weight
10     T  H      53
16     H  E      33
35     N  D      19
0      T  O      18
55     O  F      17
..   ... ..     ...
226    R  D       4
241    L  L       4
21     I  O       4
8      T  T       4
166    P  A       4

[99 rows x 3 columns]

Note: I've included only the most frequent (occurrence > 3) pairs of subsequent letters for the sake of simplifying this example.

Unsurprisingly, the most principal results are:

"TH" (e.g., as in "the"...) is the most common bigram,
followed by "HE" (also, as in, "the"...).

Let's see how pyvis visually represents this:

import pandas as pd
from IPython.display import display, HTML
from pyvis.network import Network

got_net = Network(
    notebook=True,
    cdn_resources="remote",
    height="500px",
    width="100%",
    bgcolor="white",
    font_color="red",
)

# set the physics layout of the network
got_net.repulsion()
got_data = df

sources = got_data["from"]
targets = got_data["to"]
weights = got_data["weight"]

edge_data = zip(sources, targets, weights)

for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]

    got_net.add_node(src, src, title=src)
    got_net.add_node(dst, dst, title=dst)
    got_net.add_edge(src, dst, value=w)

neighbor_map = got_net.get_adj_list()

# add neighbor data to node hover data
for node in got_net.nodes:
    node["title"] += "\nNeighbors:\n"
    neighbor_distances = {}
    for neighbor in neighbor_map[node["id"]]:
        bigram = node["id"] + neighbor
        dist = freqs[bigram]
        neighbor_distances[neighbor] = dist
    for n, d in sorted(
        neighbor_distances.items(), key=lambda kv: kv[1], reverse=True
    ):
        node["title"] += f"{n}: {d}\n"
    node["value"] = len(neighbor_map[node["id"]])

got_net.show("network.html")

Clearly, a critical consideration with such a type of "network" visualization is the reality that individual nodes can be simultaneously a place of "from" and "to". And thus, when visualizing the "weight" of any connection between two individual nodes, it must be taken into consideration the weight in both directions. This can become obviously quite complex pretty quickly as the number of nodes and possible pairwise paths permutationally increase.

Nonetheless pyvis handles this excellently, by making the network interactive and also by visually representing the strength, if you will, (i.e., the weight) of node interconnections by both their overall placement in the network relative to other nodes but also the width of their edges.

Pandas Dataframe to Network graph

1 Answers1

Don't forget to set the 'physics engine' of the network graph, and also use the value (not label) parameter when setting the weights of the edges

Don't forget to set the 'physics engine' of the network graph, and also use the `value` (not `label`) parameter when setting the weights of the edges