I know similar questions on this topic have been asked before, but I'm still struggling to make any headway with my problem.
Basically, I have three dataframes (of sizes 402 x 402, 402 x 3142, and 1 x 402) and I'm combining elements from them into a calculation. I then write the calculation to another dataframe - see code below using dummy data. Each calculation takes between 0.3-0.8 ms, but there are (402 x 3142)^2 total calculations, which obviously takes a long time!
Since none of the calculations is dependent on any other, this is ripe for parallelization, but I'm really having a hard time figuring out how to do this - sorry the code is probably pretty ugly, very new to python, and parallel computing.
One additional thing to note is that the non-vector matrices are sparse (0.4 and 0.3, respectively), so could be changed to coordinate or compressed row/column format so that not all of the possible combinations of calculations need to be made. This might reduce the time by half.
import pandas as pd
A = pd.DataFrame(np.random.choice([0, 1], size=(402,402), p=[0.6,0.4]))
B = pd.DataFrame(np.random.choice([0, 1], size=(402,3142), p=[0.7,0.3]))
x = A.sum(axis = 1)
col_names = ["R", "I", "S", "J","value"]
results = pd.DataFrame(columns = col_names)
row = 0
for r in B.columns:
for s in B.columns:
for i in A.index:
for j in A.columns:
results.loc[row,"R"] = r
results.loc[row,"I"] = i
results.loc[row,"S"] = s
results.loc[row,"J"] = j
results.loc[row, "value"] = A.loc[i,j]*B.loc[j,s]*B.loc[i,r]/x[i]
row = row + 1