I have a Dataframe that looks like:
------------------------------
|Date | Deal | Country |
------------------------------
|2019-01-02 | ABC | US |
------------------------------
|2019-02-01 | ABC | US |
------------------------------
|2019-03-03 | ABC | US |
------------------------------
Another Dataframe that looks similar with Country populated based on certain logic
------------------------------
|Date | Deal | Country |
------------------------------
|2019-01-01 | ABC | NaN |
------------------------------
|2019-02-01 | ABC | US |
------------------------------
|2019-03-02 | ABC | NaN |
------------------------------
Notice that the dates are different. In the second Dataframe, I wish to populate the Null Country tags by referencing the latest Country tag from the first Dataframe, So I can do something like, searching the Deal in 1st Dataframe and sorting in descending order by date and getting the latest Country column to populate the 2nd Dataframe. However, this gets very slow for 2million+ rows....
Is there any better way to do this?
I tried groupby(['Date', 'Deal']).agg(lambda x: sorted(x)[0])
but can't merge with the 1st Dataframe because the Dates don't match. I just need to fill it with the nearest Country tag based on Date.....
Any pointers on this would be greatly appreciated! thanks in advance..