3

Problem

How can you insert rows for missing YEARS, with imputed annual SALES.

Progress

The following code computes the sales differences. However, it is for one year, using the explicit iloc pointer technique.

import pandas as pd

data = {"YEAR": [1990, 1995, 2000, 1990, 1995, 2000], 
        "COUNTRY": ["USA", "USA", "USA", "USA", "USA", "USA"],
        "STATE": ["AZ", "AZ", "AZ", "AZ", "AZ", "AZ"],
        "BRANCH":["Bed", "Bed", "Bed", "Kitchen", "Kitchen", "Kitchen"], 
        "SALES": [50, 80, 100, 10, 20, 50]}

df = pd.DataFrame(data)

value_first = df.iloc[0][4]
value_second = df.iloc[1][4]

delta_step = (value_second - value_first) / 5 # because 5 years between

for x in range(0, 6): 
    print((x * delta_step) + value_first)

Original Data

Original dataset

Goal

The goal is to insert the yellow highlighted rows, where sales are imputed using straight-line steps between 1990 and 2000 sales figures. Destination dataset

MinneapolisCoder9
  • 601
  • 1
  • 11
  • 29

1 Answers1

0

First fine the missing year, of each group and make that into a merge df

idx=df.groupby(['COUNTRY','STATE','BRANCH'])['YEAR'].\
       apply(lambda x : pd.Series(range(min(x),max(x)+1))).\
         reset_index(level=[0,1,2])

Then do merge

yourdf=idx.merge(df,how='left')

Then using interpolate impute the missing value

yourdf['SALES']=yourdf.groupby(['COUNTRY','STATE','BRANCH'])['SALES'].apply(pd.Series.interpolate)
BENY
  • 317,841
  • 20
  • 164
  • 234