I have a pandas dataframe that looks like this
Code to reproduce -
import pandas as pd
df = pd.DataFrame([['sample_1', 'sample_2', 0.2],
['sample_1', 'sample_3', 0.5],
['sample_2', 'sample_4', 0.8]],
columns=['SampleA', 'SampleB', 'Num_Differences'])
# make unique, sorted, common index
idx = sorted(set(df['SampleA']).union(df['SampleB']))
# reshape
(df.pivot(index='SampleA', columns='SampleB', values='Num_Differences')
.reindex(index=idx, columns=idx)
.fillna(0, downcast='infer')
.pipe(lambda x: x+x.values.T)
)
I would like to convert it to an array in array like this. This array in array would be the variable called dis_matrix
in the multidimensional scaling code below.
[[0 0.2 0.5 0]
[0.2 0 0 0.8]
[0.5 0 0 0]
[0 0.8 0 0]]
How can I get an array in array from the pivoted dataframe above?
My end goal is so that I can apply the MDS code below
mds_model = manifold.MDS(n_components = 2, random_state = 123,
dissimilarity = 'precomputed')
mds_fit = mds_model.fit(dis_matrix)
mds_coords = mds_model.fit_transform(dis_matrix)
food_names = ['sample 1', 'sample 2', 'sample 3', 'sample 4']
plt.figure()
plt.scatter(mds_coords[:,0],mds_coords[:,1],
facecolors = 'none', edgecolors = 'none') # points in white (invisible)
labels = food_names
for label, x, y in zip(labels, mds_coords[:,0], mds_coords[:,1]):
plt.annotate(label, (x,y), xycoords = 'data')
plt.xlabel('First Dimension')
plt.ylabel('Second Dimension')
plt.title('Dissimilarity among food items')
plt.show()