0

I'm trying to calculate the probability of each source address give a certain destination IP deriving from PACKET_IN message. To do this, I first use DataFrame to reunite those addresses and then use nested loop to address different probability of occurrence. The code does work on IDE, however, it gives me different output on the controller. It seems like something wrong with loop statement in my code, could you give me a hand?

merlò
  • 15
  • 4
  • Are you using the DataFrame for anything else? – wwii Jul 31 '19 at 14:40
  • 1
    The issue looks like missing indentation on the print statement. Also, rather than writing loops like this in pandas, try using [groupby](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html). It should give you faster and cleaner code. – David Nehme Jul 31 '19 at 14:40

2 Answers2

0

You can eliminate the loops by using the split-apply-combine features of pandas.

First, let's abstract away the "pox" portion of your problem by creating a dataframe with integer src/dst.

import pandas as pd
import numpy as np
src = np.trunc(np.random.uniform(0, 5, size=1000))
dst = np.trunc(np.random.uniform(0, 3, size=1000)) + src
df = pd.DataFrame({'dst': x, 'src': y})

So in this example, the src and dst are correlated. To get the frequency counts are available with a single line

df.groupby('dst').src.value_counts()

which yields something like the following.

dst  src
0.0  0.0    71
     2.0    68
     1.0    45
1.0  3.0    80
     2.0    76
     1.0    60
2.0  4.0    84
     3.0    61
     2.0    56
3.0  3.0    90
     4.0    58
     5.0    50
4.0  4.0    71
     6.0    67
     5.0    63

This gives you raw counts of each src/dst pair. You can convert this into the fraction of time that each src appeared given a single dst by using the groupby object twice: once to compute the frequency of each src/dst like above, and once to compute the frequency of each dst.

g = df.groupby('dst')
g.src.value_counts() / g.size()

Which will yield something like

dst  src
0.0  0.0    0.385870
     1.0    0.244565
     2.0    0.369565
...
4.0  4.0    0.353234
     5.0    0.313433
     6.0    0.333333
David Nehme
  • 21,379
  • 8
  • 78
  • 117
  • Hi David, I got this error message"TypeError: zip argument #51 must support iteration" on POX when executing 'g.src.value_counts() / g.size()'. Again... the code works on my IDE, I thought it might be the feature support issue of python 2.7. Can I use another method to do division? I tried to put it in a loop but to no avail. – merlò Jul 31 '19 at 17:54
0

If you are not using the DataFrame for anything else, you can use itertools.groupby by converting the string ip addresses to ipaddress objects so they can be sorted.

import ipaddress, itertools

ipList_Dst = ['10.0.0.2', '10.0.0.2', '10.0.0.2', '10.0.0.2',
              '10.0.0.2', '10.0.0.2', '10.0.0.2', '10.0.0.2',
              '10.0.0.2', '10.0.0.2', '10.0.0.2', '10.0.0.2']
ipList_Src = ['70.240.175.230', '243.41.191.23', '18.191.71.228',
              '62.95.69.19', '167.31.217.139', '30.63.153.99',
              '74.88.164.220', '135.131.110.167', '59.237.249.54',
              '34.24.183.147', '21.201.47.164', '167.31.217.139']

dst = map(ipaddress.ip_address,ipList_Dst)
src = map(ipaddress.ip_address,ipList_Src)
pairs = sorted(zip(dst, src))

for key,group in itertools.groupby(pairs):
    print([str(addr) for addr in key])

Each key of the groupby object will be a unique (dst,src) combination.

wwii
  • 23,232
  • 7
  • 37
  • 77