2

Let's say you have a dataframe like this:

>>> df = pd.DataFrame({
        'epoch_minute': [i for i in reversed(range(25090627,25635267))],
        'count': [random.randint(11, 35) for _ in range(25090627,25635267)]})
>>> df.head()
   epoch_minute  count
0      25635266     12
1      25635265     20
2      25635264     33
3      25635263     11
4      25635262     35

and some relative epoch minute deltas like this:

day = 1440
week = 10080
month = 302400

How do I accomplish the equivalent of this code block:

for i,r in df.iterrows():
    if r['epoch_minute'] - day in df['epoch_minute'].values and \
            r['epoch_minute'] - week in df['epoch_minute'].values and \
            r['epoch_minute'] - month in df['epoch_minute'].values:
        # do stuff

using this syntax:

valid_rows = df.loc[(df['epoch_minute'] == df['epoch_minute'] - day) &
                    (df['epoch_minute'] == df['epoch_minute'] - week) &
                    (df['epoch_minute'] == df['epoch_minute'] - month]

I understand why the loc select doesn't work, but I'm just asking if there exists a more elegant way to select the valid rows without iterating through the rows of the dataframe.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
aweeeezy
  • 806
  • 1
  • 9
  • 22

1 Answers1

1

Add parentheses and & for bitwise AND with isin for check membership:

valid_rows = df[(df['epoch_minute'].isin(df['epoch_minute'] - day)) &
                (df['epoch_minute'].isin(df['epoch_minute'] - week)) &
                (df['epoch_minute'].isin(df['epoch_minute'] - month))]

valid_rows = df[((df['epoch_minute'] - day).isin(df['epoch_minute'])) &
                ((df['epoch_minute']- week).isin(df['epoch_minute'] )) &
                ((df['epoch_minute'] - month).isin(df['epoch_minute']))]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Ah yes, you're right about the syntax -- but `valid_rows` is still empty after executing either of these `loc` selections whereas the for loop produces correctly identifies 242240 valid rows. – aweeeezy Sep 28 '18 at 06:57
  • 1
    Shouldn't the arguments of `isin` and the value it's called on be switched like this: (df['epoch_minute'] - day).isin(df['epoch_minute']) & ... ? – aweeeezy Sep 28 '18 at 07:11
  • @aweeeezy - I test it and get same output, but added to answer. – jezrael Sep 28 '18 at 07:13