2

I am new to programming and python and would like to write the following piece of code as a function using the 'def' 'return' construction:

df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True
df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True

My attempt returns invalid syntax:

def ConsecutiveID(df, column ='DATE_INT'):
    return  df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True
            df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True

My goal is to ultimately use my ConsecutiveID function as follows:

    df.groupby(['COUNTY_GEOID_YEAR','TEMPBIN']).apply(ConsecutiveID)

I am applying the split-apply-combine construction. Where groupby is splitting my data and I use the function I would like to construct in apply.

My main question is how to write what I've called the ConsecutiveID as a function. Thank you for any help.

Justin
  • 327
  • 1
  • 3
  • 11
  • 1
    don't use `return`, `df` will still be updated in the function. Note, most pandas function require `inplace=true` to actually update the df. – Julien Aug 04 '16 at 19:28
  • 1
    After the two statements just return the `df`. – shivsn Aug 04 '16 at 19:29
  • @JulienBernu - Great thank you, that solved the syntax error.. Having other problems applying the function. Thank you for the help though! – Justin Aug 04 '16 at 19:35
  • 1
    Try just calling `ConstructiveID` on `df`. Btw, convention is to start functions with lower case, capital being for classes... – Julien Aug 04 '16 at 19:39
  • @JulienBernu -- Thank you - its worked and its done exactly what I've wanted it to do! Appreciate it a lot! – Justin Aug 04 '16 at 19:45

1 Answers1

1
def ConsecutiveID(df):
    df = df.copy()
    cond1 = df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1
    cond2 = df['DATE_INT'].shift(1) - df['DATE_INT'] == -1

    df.loc[cond1 | cond2, 'CONSECUTIVE_DAY'] = True

    return df
piRSquared
  • 285,575
  • 57
  • 475
  • 624