0

Here is the problem I pull a csv into a pandas data frame that looks like so:

Identity    Date       value1    value2    Random
Apple       1/1/2005   10        10        Orange
Apple       12/1/2005  1         1         Orange

I need to then call the Identity Apple, find its min and max dates and insert rows ie months in order to interpolate between the two points so the end result becomes

Identity    Date       value1    value2    Random
Apple       1/1/2005   10        10        Orange
Apple       2/1/2005   0         0         Orange
Apple       3/1/2005   0         0         Orange
 .              .      .         .         .
 .              .      .         .         .
 .              .      .         .         .
Apple       12/1/2005  1         1         Orange

The problem is that although I can loop through a list of identities and get all rows associated I cant seem to find a way to then insert extra rows, especially without a nasty for loop. essentially I need to bridge the date gap and fill the associated Identity values with zeros.

list = ['Apple','Banana','Orange']    
for i in range(0,len(list))
    data.loc[data['Identity'].isin(list[i])]

EDIT:

Working Code Below:

import pandas as pd

df = pd.DataFrame([['Apple',pd.to_datetime('1/1/2005'),10,10,'Orange'],    ['Orange', pd.to_datetime('8/1/2005'),1, 1 ,'Apple'],['Apple',         pd.to_datetime('12/1/2005'),1, 1 ,'Orange']])
df.columns = ['Identity','Date', 'value1' , 'value2','Random']

full_df = pd.DataFrame()
dummydata = []

indentity = ['Apple','Orange']
random = ['Orange','Apple']
years = ['2005','2005']

for i in range(0,2):
    full_df = pd.DataFrame()
    full_df['Date'] = [pd.to_datetime(str(x)+'/1/'+str(years[i])) for x in   range(1,13)]
    full_df['Identity'] = indentity[i]
    full_df['Random'] = random[i]
    dummydata.append(full_df)

full_df = pd.concat(dummydata)                     
result =  full_df.merge(df,how='left').fillna(0)
#print(dummydata)
#print(full_df)
print(result )
Tyler Cowan
  • 820
  • 4
  • 13
  • 35
  • Welcome to StackOverflow! Can you try to provide a minimal example with the csv data and then the code for the minimal example? This way we can copy the csv data and try the code, and help you. For example, you don't need gas, water, etc. maybe just 2 columns. It's slightly confusing when you use 'Apple' in your example, and your code reflects different variables. – ilyas patanam Dec 10 '15 at 22:23
  • I suppose adding this extra code doesnt help as I want this to work using pandas dataframe for speed. You would think interpolation between non existing dates by inserting rows would be like a one liner – Tyler Cowan Dec 10 '15 at 22:29
  • I'd suggest reducing your original code to something minimal, so it's easier for you and others find your error. I was quite confused because it doesn't correspond with the example. – ilyas patanam Dec 10 '15 at 22:37
  • Thanks imp I have condensed the problem – Tyler Cowan Dec 11 '15 at 21:11

1 Answers1

1

My suggestion is create the full theoretical DF, merge with data and fillna:

import pandas as pd

df = pd.DataFrame([['Apple',pd.to_datetime('1/1/2005'),10,10,'Orange'],['Apple', pd.to_datetime('12/1/2005'),1, 1 ,'Orange']])
df.columns = ['Identity','Date', 'value1' , 'value2','Random']

full_df = pd.DataFrame()
full_df['Date'] = [pd.to_datetime(str(x)+'/1/2005') for x in range(1,13)]
full_df['Identity'] = 'Apple'

result =  full_df.merge(df,how='left').fillna(0)
result

This is good for one Identity and year, loop over years and Identities, append all created DF's into a list and pd.concat(list)

Ezer K
  • 3,637
  • 3
  • 18
  • 34