1

I have a data set of a tea export company and it includes total export and tea types and weight categories.

It looked like this

Date        Type    Weight    Quantity      Price
2016-01-01  black   bags      1734136.51    1131.30
2016-01-01  black   bulk      10722389.66   510.86
2016-01-01  black   4g_1kg    6817078.01    588.72
2016-01-01  black   1kg_3kg   86444.50      565.91
2016-01-01  black   3kg_5kg   1003986.73    552.39

Now that I have grouped the data with this

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['YearMonth'] = df['Date'].map(lambda date: 100*date.year + date.month)

df = df.groupby(['YearMonth','Type', 'Weight']).agg({'Quantity':'sum'})

And the dataframe now looks like this

YearMonth   Type    Weight     Quantity
201601      black   1kg_3kg    86444.50
                    3kg_5kg    1003986.73
                    4g_1kg     6817078.01
                    5kg_10kg   2816810.33
                    bags       1734136.51
                    bulk       10722389.66
            green   3kg_5kg    12.00
                    4g_1kg     53014.95
                    5kg_10kg   1132.00
                    bags       41658.19
                    bulk       112400.00
            instant 4g_1kg     28.80
                    lt3kg      89486.40
201602      black   1kg_3kg    215539.60

I tried simple ways to use XGBoost and Linear regressions to predict but it didn't work. What I want is the overall total prediction for few years and individual tea type and weight class predictions. Can someone tell me what' the way to achieve this?

John Snape
  • 79
  • 5

0 Answers0