How to compute Shannon entropy of Information from a Pandas Dataframe?

Question

I have a dataframe df that contains the information of transactions from a individual Name_Give to another Name_Receive like the following:

df
    Name_Give    Name_Receive   Amount
0    John           Tom          300
1    Eva            Tom          700
2    Sarah          Tom          100
3    John           Tom          200
4    Tom            Eva          700
5    John           Eva          300
6    Carl           Eva          250

for each Name_Receive j I would like to compute the Shannon Entropy as S_j = -sum_i p_i \log p_i where p_i is the amount divided by the sum of the amount for the user j

S_Tom = - (300/1300 * np.log(300/1300) + 700/1300 * np.log(700/1300) + 100/1300 * np.log(100/1300) + 200/1300 * np.log(200/1300))

S_Eva = - (700/1250 * np.log(700/1250) + 300/1250 * np.log(300/1250) + 250/1250 * np.log(250/1250)

S_Tom = 1.157
S_Eva = 0.99

I would like to have dataframe df1 like the following

df1
     Name     Entropy
0    Tom      1.157
1    Eva      0.99

Space Impact · Accepted Answer · 2018-12-27T12:41:30.870

9

Use groupby and transfrom to get total sum of each group and then divide the Amount column values with each group sum and compute the values :

g_sum = df.groupby('Name_Receive')['Amount'].transform('sum')
values = df['Amount']/g_sum
df['Entropy'] = -(values*np.log(values))

df1 = df.groupby('Name_Receive',as_index=False,sort=False)['Entropy'].sum()

print(df1)
  Name_Receive   Entropy
0          Tom  1.156988
1          Eva  0.989094

If the values contain 0's then use at the end after groupby:

df1['Entropy'] = df1['Entropy'].fillna(0)

Since 0*np.log(0) gives nan to make it 0 use fillna.

edited Dec 27 '18 at 12:41

answered Nov 06 '18 at 18:04

Space Impact

13,085
23
48

I think it's a good solution, but only assuming you have no 0 values in the 'Amount' column. `>>> np.log([1, np.e, np.e**2, 0])` will result in : `array([ 0., 1., 2., -Inf])` While calculating Entropy zero values should be skipped (or summed as zeros) – Oleg Dec 27 '18 at 12:06
@Oleg When you multiply `0*np.log(0)` you get `nan`, not `Inf`. This can be filled using `fillna(0)` easily. While writing the answer I considered only the sample data and not any other limitations like this. – Space Impact Dec 27 '18 at 12:26
For the sample data it will work, but personally I came to it looking for a solution for my case. Perhaps you should consider adding an option for 0 values to cover all the cases. Your suggestion should work, I think another way to do it will be using `import scipy.stats as st` `df['Entropy'] = st.entropy(values)` – Oleg Dec 27 '18 at 12:35
1

@Oleg I tested `st.entropy` even this gives `nan` and Updated the answer for `0` case. – Space Impact Dec 27 '18 at 12:43

score 3 · Answer 2 · answered Oct 31 '20 at 20:07

3

You could also apply the entropy function of scipy

from scipy.stats import entropy
E = df.groupby('Name_Receive')['Amount'].apply(lambda x : entropy(x.value_counts(), base=2)).reset_index()

answered Oct 31 '20 at 20:07

Gerardo Zinno

1,518
1
13
35

How to compute Shannon entropy of Information from a Pandas Dataframe?

2 Answers2