Python 3 Cosine Nearest Neighbor Format

Question

I am working on some data mining self-learning from a free online resource I found. Basically I got a csv file with a bunch of names, movie titles, and what each person rated it. I'm trying to get the K-Nearest Neighbor from it using a cosine metric but I can't get the output to look not awful. Heres what I have so far for the code:

from pandas import DataFrame
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors as nn

df =  pd.read_csv("https://docs.google.com/spreadsheets/d/1MSBm3M6YmaLf0aiJCvkvrPsIJB2pPuBwse5ylnzEHRI/pub?gid=639849687&single=true&output=csv",index_col='Unnamed: 0')

df = df.fillna(0)

nn([df], metric = 'cosine')

Pretty simple to do! Except my output looks like this:

NearestNeighbors(algorithm='auto', leaf_size=30, metric='cosine',
     metric_params=None, n_jobs=1,
     n_neighbors=[                      Patrick C  Heather  Bryan  
Patrick T  Thomas  aaron  \
Alien                       NaN      NaN    2.0        NaN     5.0    
4.0
Avatar                      4.0      5.0    5.0        4.0     2.0    NaN
Blade Runner                5.0      NaN    NaN        N...
You Got Mail           NaN  2.0      2.0   1.0      2.0      NaN   2.0

[25 rows x 25 columns]],
     p=2, radius=1.0)

Its messy and doesn't even show all the data. I tried casting it into an array but I go the error message "'ABCMeta' object does not support indexing"

I'm fairly new to Python, I can do a few basic things but I am no expert. I was hoping someone could help nudge me in the direction to help clean this up.

Thank you.

score 0 · Answer 1 · answered Sep 21 '18 at 18:16

I'm unclear what your desired output looks like. However, you should first instantiate the class and then use the fit() method.

from pandas import DataFrame
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors as nn

df =  pd.read_csv("https://docs.google.com/spreadsheets/d/1MSBm3M6YmaLf0aiJCvkvrPsIJB2pPuBwse5ylnzEHRI/pub?gid=639849687&single=true&output=csv",index_col='Unnamed: 0')
df = df.fillna(0)
model = nn(metric = 'cosine')
model.fit(df.values)

Now model is a fitted object that you can use to find the K-neighbors of any new point, which is probably your goal. See the documentation here.

Python 3 Cosine Nearest Neighbor Format

1 Answers1