I am trying to create a matrix of the results of a function, which involves a crosstab of dataframe columns. The function operates on a pair of dataframe columns in turn, so that the end result is a matrix of the results applied to each pair. The column indices of the columns I want to operate the pd.crosstab
on, are in a list, cols_index
. Here's my code:
cols_index # list of dataframe column indices. All fine.
res_matrix = np.zeros([len(cols_index),len(cols_index)]) # square matrix of zeros, each dimension is the length of the number of columns
for i in cols_index:
for j in cols_index:
confusion_matrix = pd.crosstab(df.columns.get_values()[i], df.columns.get_values()[j]) # df.columns.get_values()[location]
result = my_function(confusion_matrix) # a scalar
res_matrix[i, j] = result
return res_matrix
However I get the following error: ValueError: If using all scalar values, you must pass an index
There's no problem with my_function because if I run my_function
on two columns of the dataframe, there's no issue:
confusion_matrix = pd.crosstab(df['colA'], df['colB'])
result = my_function(confusion_matrix) # returns 0.29999 which is fine
I've tried various ways of fixing this, including looking at this post: How to fill a matrix in Python using iteration over rows and columns
but in this case I can't see how to use broadcasting over the Pandas columns.
Any ideas appreciated, thanks.