I'm trying to calculate k-3 nearest neighbours by hand using the manhattan distance.
I have a data frame called data
and a query observation called query
. I need to be able to do something like this sum(abs(query-data))
for every observation in data
.
So far I have written a for loop like this:
numeric_columns = data.columns[data.dtypes == np.number]
for rows in data:
print(query[numeric_columns] - data[numeric_columns])
This returns all columns names with values as NaN for the original length of data
: 16, 16 times over.
I'm quite new to writing for loops and I don't really understand what I've done wrong here. I also want to be able to return the distance and the index, but think I should attempt to get this for loop correct first.
Can anyone help me?