0

I'm trying to calculate k-3 nearest neighbours by hand using the manhattan distance.

I have a data frame called data and a query observation called query. I need to be able to do something like this sum(abs(query-data)) for every observation in data.

So far I have written a for loop like this:

 numeric_columns = data.columns[data.dtypes == np.number]

for rows in data:
    print(query[numeric_columns] - data[numeric_columns])

This returns all columns names with values as NaN for the original length of data: 16, 16 times over. I'm quite new to writing for loops and I don't really understand what I've done wrong here. I also want to be able to return the distance and the index, but think I should attempt to get this for loop correct first.

Can anyone help me?

Jaimee-lee Lincoln
  • 365
  • 1
  • 3
  • 11
  • [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Peter Apr 05 '20 at 09:16

1 Answers1

0

There is a method sub that is used to subtract data frames. You can learn more here NaNs when subtracting dataframes pandas and here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sub.html. Whats about the loop. You want only numeric columns to be subtracted, so you need an if checking that. Then loop should look like this:

for rows in data:
    if data[rows].dtypes == np.number:
        t = query[row].sub(data[row], fill_value=0)
        print(t)

With a loop like that you don't need this part numeric_columns = data.columns[data.dtypes == np.number]

DeNice
  • 107
  • 10