-4

Background : The business objective is to build a ML model which predicts lift in profit based on historical performance of the product during both promotional period as well as non-promotional periods.

I have built a ML model which predicts profit on daily level for each asin based on historical data using following inputs : Date, Product ID, Promotion flag , Promotion type, discount percent, current price.

Now I want to generate inference data by building a dataset which has all combinations of productID and promotion types along with several levels of discount percent for each day in the future. Is there a way to do this in pandas ?

Conditions : Given a date range (for ex. 3/17/23 - 3/27/23), promotion types - 3 types of promotions (bd,pd,ld), discount percent [0,5,10]

generate data for each day in the date range for each productID with different combinations of promotion type and discount percent.

Expected dataframe output :

This is just an example for two dates (3/17 and 3/18), 1 productID and all combinations for promotion type and discount percent. I want to create this for entire range of dates for each productID.

Expected output screenshot

  • 2
    Welcome to [Stack Overflow.](https://stackoverflow.com/ "Stack Overflow"). This site is not a code-writing or tutoring service, it is best used when you have a specific problem with written code. Your question, reads like "Can someone code this for me" or "Show me a tutorial or information on how to code this" which are off-topic for Stackoverflow. Please take the [tour](https://stackoverflow.com/tour), read [what's on-topic here](https://stackoverflow.com/help/on-topic). – itprorh66 Mar 17 '23 at 23:03

1 Answers1

0

Looking at the example of the expected output you provided, here is a simple way to do it: list the different values you want to see in your dataframe, generate all combinations using itertools.product, and finally create the dataframe with the columns you want.

from itertools import product
import pandas as pd

# Configuration
products_ids = [1, 2, 3]
date_range = ["2023-03-17", "2023-03-27"]
discount_types = ["bd", "pd", "ld"]
discount_prct = [0, 5, 10]

# Get all dates in range
all_dates = pandas.date_range(date_range[0], date_range[1], freq='d')

# Get all combinations
combinations = product(products_ids, all_dates, discount_types, discount_prct)

# Create the dataframe
df = pd.DataFrame(combinations, columns=["ProductID", "Date", "DiscountType", "DiscountPercent"])

Here are the 15th first rows:

ProductID Date DiscountType DiscountPercent
0 1 2023-03-17 00:00:00 bd 0
1 1 2023-03-17 00:00:00 bd 5
2 1 2023-03-17 00:00:00 bd 10
3 1 2023-03-17 00:00:00 pd 0
4 1 2023-03-17 00:00:00 pd 5
5 1 2023-03-17 00:00:00 pd 10
6 1 2023-03-17 00:00:00 ld 0
7 1 2023-03-17 00:00:00 ld 5
8 1 2023-03-17 00:00:00 ld 10
9 1 2023-03-18 00:00:00 bd 0
10 1 2023-03-18 00:00:00 bd 5
11 1 2023-03-18 00:00:00 bd 10
12 1 2023-03-18 00:00:00 pd 0
13 1 2023-03-18 00:00:00 pd 5
14 1 2023-03-18 00:00:00 pd 10
thmslmr
  • 1,251
  • 1
  • 5
  • 11