I have a large Dataframe (189090, 8), I need to calculate Euclidean distance and the similarity.
My approach:
from scipy.spatial import KDTree
from scipy.spatial.distance import pdist
scaler = MinMaxScaler()
scaled = scaler.fit_transform(ds)
Y = pdist(scaled)
Y_squared = squareform(Y)
X_tree = KDTree(Y_squared)
dist, ind = X_tree.query(Y_squared, k=4)
But when I run the code my notebook (kernel shut down) or my pycharm kill. But if I reduce the shape of the dataframe (e.g 5000, 8), the process runs normally.
I tried to reduce the memory used by the dataframe, however still did not function. I know that the code that does not run is this Y = pdist(scaled)
How can I make this work?