0

I am using Python 2.7 on Windows 7

From this question: Python networkx : edge contraction we know how to do Edge Contraction using networkx. But it this also possible using pandas?

Say I have a dataframe df that represents directed edges from fld1 to fld2 and r_val is the weight of that connection.

Here is a picture of what the network defined by df looks like: network

import pandas as pd
df = pd.DataFrame({'fld1': ['a',    'a',    'b',    'c',    'c',    'g',    'd',    'd',    'e',    'e',    'f']
                ,  'fld2': ['b',    'c',    'f',    'd',    'g',    'd',    'e',    'b',    'c',    'f',    'b']
                , 'r_val': [0.1,    0.9,    1,  0.5,    0.5,    1,  0.8,    0.2,    0.2,    0.8,    1]})

df
Out[4]: 
   fld1 fld2  r_val
0     a    b    0.1
1     a    c    0.9
2     b    f    1.0
3     c    d    0.5
4     c    g    0.5
5     g    d    1.0
6     d    e    0.8
7     d    b    0.2
8     e    c    0.2
9     e    f    0.8
10    f    b    1.0

I would like to contract the edges where r_val is equal to 1 so that df becomes df2. This means, making fld1 equal to fld2 where r_val == 1. Cases where r_val == 1 in both directions (in the case of node B and node F for example) it does not matter which node is removed.

df2 = pd.DataFrame({'fld1': ['a',    'a',  'd',    'd',    'e',    'e'  ]
                ,  'fld2': ['b',    'd',   'e',    'b',    'd',    'b'  ]
                , 'r_val': [0.1,    0.9,   0.8,    0.2,    0.2,    0.8]})

df2
Out[6]: 
  fld1 fld2  r_val
0    a    b    0.1
1    a    d    0.9
2    d    e    0.8
3    d    b    0.2
4    e    d    0.2
5    e    b    0.8

EDIT

This will need to be done iteratively until there are no more r_val's equal to 1. When some edges are contracted, they make new edges that could also be equal to 1.

Community
  • 1
  • 1
BeeGee
  • 815
  • 2
  • 17
  • 33

1 Answers1

1

Not a pandas wizard, but here's one way that seems to work.

One iteration would be;

# Find rows where 'r_val' = 1 and replace its 'fld1' with 'fld2' in 
# the entire frame.
df = df.replace(list(df['fld1'][df['r_val']==1]), list(df['fld2'][df['r_val']==1]))

# Eliminate all edges that have collapsed
df = df[df['fld1'] <> df['fld2']]

# Sum up 'r_val' for all edges with the same 'fld1' and 'fld2'
df = df.groupby(['fld1','fld2'], group_keys=1)['r_val'].sum().reset_index()

Example run with your data;

Start:

   fld1 fld2  r_val
0     a    b    0.1
1     a    c    0.9
2     b    f    1.0
3     c    d    0.5
4     c    g    0.5
5     g    d    1.0
6     d    e    0.8
7     d    b    0.2
8     e    c    0.2
9     e    f    0.8
10    f    b    1.0

First iteration:

  fld1 fld2  r_val
0    a    b    0.1
1    a    c    0.9
2    c    d    1.0
3    d    b    0.2
4    d    e    0.8
5    e    b    0.8
6    e    c    0.2

Second iteration:

  fld1 fld2  r_val
0    a    b    0.1
1    a    d    0.9
2    d    b    0.2
3    d    e    0.8
4    e    b    0.8
5    e    d    0.2

With no more r_val = 1, we're done.

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294