I'm quite new to machine learning. I'm trying to match people from SetA with people from SetB based on their interest ratings (1=Low, 10=High). My real data set has 40 features (also later I want to set a higher weighting on certain features, as well as interests that are less common - I believe this will help me?).
Example dataset:
>>> dfA = pd.DataFrame(np.array([[1, 1, 1], [4, 4, 4], [8, 8, 8]]),
columns=['interest1', 'interest2', 'interest3'],
index=['personA1','personA2','personA3'])
>>> dfB = pd.DataFrame(np.array([[4, 4, 3], [2, 2, 1], [1, 2, 2]]),
columns=['interest1', 'interest2', 'interest3'],
index=['personB1','personB2','personB3'])
print(dfA, "\n", dfB)
>>> interest1 interest2 interest3
personA1 1 1 1
personA2 4 4 4
personA3 8 8 8
interest1 interest2 interest3
personB1 4 4 3
personB2 2 2 1
personB3 1 2 2
I'm using sklearn's nearest neighbors algorithm for this:
knn = NearestNeighbors(n_neighbors = 2).fit(dfA)
distances, indicies = knn.kneighbors(dfB)
>>> print(distances, "\n \n", indicies)
>>>[[1. 4.69041576]
[1.41421356 4.12310563]
[1.41421356 4.12310563]]
[[1 0]
[0 1]
[0 1]]
I don't understand the output? I'm aware of a similar question's explanation however I don't know how to apply it to this situation as there are 2 different datasets.
Ultimately, I want a final dataframe for matches like:
SetA SetB
personA1 personB2
personA2 personB1
personA3 personB3