I've 2 DataFrames. One for Invoice Data
and the other for Promotion Dates
.
Invoice Data
DataFrame
+------------+-------------+---------------+
| LocationNo | InvoiceDate | InvoiceAmount |
+------------+-------------+---------------+
| A | 01-Jul-20 | 79 |
+------------+-------------+---------------+
| B | 01-Jul-20 | 72 |
+------------+-------------+---------------+
| C | 01-Jul-20 | 24 |
+------------+-------------+---------------+
| A | 02-Jul-20 | 68 |
+------------+-------------+---------------+
| B | 02-Jul-20 | 6 |
+------------+-------------+---------------+
| C | 02-Jul-20 | 27 |
+------------+-------------+---------------+
| A | 03-Jul-20 | 25 |
+------------+-------------+---------------+
| B | 03-Jul-20 | 62 |
+------------+-------------+---------------+
| C | 03-Jul-20 | 58 |
+------------+-------------+---------------+
| D | 03-Jul-20 | 36 |
+------------+-------------+---------------+
| E | 03-Jul-20 | 65 |
+------------+-------------+---------------+
| F | 03-Jul-20 | 81 |
+------------+-------------+---------------+
df_1 = pd.DataFrame({
'LocationNo':['A','B','C','A','B','C','A','B','C','D','E','F'],
'InvoiceDate':['01-Jul-20','01-Jul-20','01-Jul-20','02-Jul-20','02-Jul-20','02-Jul-20',
'03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20'],
'InvoiceAmount':[79,72,24,68,6,27,25,62,58,36,65,81]
})
Promotion Dates
DataFrame
+------------+----------------+--------------+
| LocationNo | PromotionStart | PromotionEnd |
+------------+----------------+--------------+
| A | 01-Jul-20 | 02-Jul-20 |
+------------+----------------+--------------+
| B | 02-Jul-20 | 03-Jul-20 |
+------------+----------------+--------------+
| C | 03-Jul-20 | 05-Jul-20 |
+------------+----------------+--------------+
| D | 01-Jul-20 | 05-Jul-20 |
+------------+----------------+--------------+
| E | 02-Jul-20 | 02-Jul-20 |
+------------+----------------+--------------+
| F | 05-Jul-20 | 06-Jul-20 |
+------------+----------------+--------------+
df_2 = pd.DataFrame({
'LocationNo' : ['A','B','C','D','E','F'],
'PromotionStart':['01-Jul-20','02-Jul-20','03-Jul-20','01-Jul-20','02-Jul-20','05-Jul-20'],
'PromotionEnd':['02-Jul-20','03-Jul-20','05-Jul-20','05-Jul-20','02-Jul-20','06-Jul-20',]
})
My Task is to Merge both the DataFrames based on 2 conditions & add value Yes
to a new column Promotion
if these conditions are met.
conditions are as follows
LocationNo
should match- The
InvoiceDate
should be between the respectiveLocationNo
'sPromotionStart
andPromotionEnd
.
Basically, to merge only If the InvoiceDate
falls between PromotionStart
and PromotionEnd
dates.
My desired output is as below
+------------+-------------+---------------+-----------+
| LocationNo | InvoiceDate | InvoiceAmount | Promotion |
+------------+-------------+---------------+-----------+
| A | 01-Jul-20 | 79 | Yes |
+------------+-------------+---------------+-----------+
| B | 01-Jul-20 | 72 | |
+------------+-------------+---------------+-----------+
| C | 01-Jul-20 | 24 | |
+------------+-------------+---------------+-----------+
| A | 02-Jul-20 | 68 | Yes |
+------------+-------------+---------------+-----------+
| B | 02-Jul-20 | 6 | Yes |
+------------+-------------+---------------+-----------+
| C | 02-Jul-20 | 27 | |
+------------+-------------+---------------+-----------+
| A | 03-Jul-20 | 25 | |
+------------+-------------+---------------+-----------+
| B | 03-Jul-20 | 62 | Yes |
+------------+-------------+---------------+-----------+
| C | 03-Jul-20 | 58 | Yes |
+------------+-------------+---------------+-----------+
| D | 03-Jul-20 | 36 | Yes |
+------------+-------------+---------------+-----------+
| E | 03-Jul-20 | 65 | |
+------------+-------------+---------------+-----------+
| F | 03-Jul-20 | 81 | |
+------------+-------------+---------------+-----------+
I know how to merge columns based on multiple conditions. But, in this case, I need to merge within Date Range
.
kindly, let me know how should i approach to solve this.