I have a Pandas Dataframe
D
storing a large database. I also have a smaller DataFrame
C
with 10 rows, containing exactly the same columns as the main one, including column 'price
'. For each row r
in the main dataframe D
I want to find a row t
in C
which is closest, i.e., the difference between r.price
and t.price
is minimal.
As an output I want to have:
I have a function computing the difference
def difference(self, row1, row2):
d = np.abs(row1["price"] - row2["price"])
return d
And I want to use apply function to apply the difference function to each row in C, for each row v in D:
for _, v in D.iterrows():
C.apply(self.difference, axis=1, args=(v,))
But I don't know how I should find the row of C
for which the difference was minimal. I was thinkking about the min
build-in function from Python, but I don't know how to apply it correctly for dataframes.
An example: Let say I have a data D
id | name | price
1. bdb | AAA | 2.34
2. 441 | BBB | 3.56
3. ae9 | CCC | 1.27
4. fb9 | DDD | 9.78
5. e6b | EEE | 5.13
6. bx4 | FFF | 6.23
7. a9a | GGG | 9.56
8. 847 | HHH | 9.23
9. e4c | III | 0.45
...
200. eb3 | XYZ | 10.34
And C (for simplicity with just 5 rows) as below
id | name | price
1. xyh | AA1 | 0.34
2. y5h | BB1 | 9.77
3. af6 | CC1 | 3.24
4. op9 | DD1 | 6.34
5. 23h | EE1 | 0.20
So, the output of my function should be as follows:
Row bdb in D should be matched with row af6
Row 441 in D should be matched with row af6
Row ae9 in D should be matched with row xyh
Row fb9 in D should be matched with row y5h
Row e6b in D should be matched with row op9
etc.