Extracting a matrix from dataframe

Question

I have something like this:

import holoviews as hv
import pandas as pd
from holoviews import opts, dim
hv.extension('bokeh')
renderer = hv.renderer('bokeh')

csv_path = r'C:\Users\jose\Downloads\enron-v1.csv'
df_csv = pd.read_csv(csv_path ,index_col=0)

df_filter = df_csv[["fromJobtitle", "toJobtitle"]]
df_final = df_filter.groupby(df_filter.columns.tolist(),as_index=False).size()

Which will produce something like this:

 fromJobtitle         toJobtitle  size
0              CEO                CEO    65
1              CEO           Director    23
2              CEO           Employee    56
3              CEO    In House Lawyer     7
4              CEO            Manager   104

and I want to extract this matrix as a way to plot the data as a sankey diagram in holoviews:

[['CEO', 'CEO', 65],
['CEO', 'Director', 23],
['CEO', 'Employee', 56]]
.......etc

I don't think you need to do this. Holoviews is also happy with dataframes. However, I think sankey is not ok with having `CEO` in both columns, that should throw an error. — mcsoini, Jun 02 '21 at 10:21

score 0 · Accepted Answer · answered Jun 02 '21 at 10:14

pd.DataFrame already stores them in this format, so you just have to do this:

df_final.values

Out[149]: 
array([['CEO', 'CEO', 65],
       ['CEO', 'Director', 23],
       ['CEO', 'Employee', 56],
       ['CEO', 'Lawyer', 7],
       ['CEO', 'Manager', 104]], dtype=object)

Extracting a matrix from dataframe

1 Answers1