Pandas:All results which fulfill multiple Conditions within a Categories

Question

I have 65 K records such as below snippet in my dataframe:

Scrip   Timestamp1          NSETS               NSEPr Buyq1 Buyq2   Buyq3   Buyq4   Buyq5   Sellq1  Sellq2  Sellq3  Sellq4  Sellq5  Sellp1  Sellp2  Sellp3  Sellp4  Sellp5  buyp1   buyp2   buyp3   buyp4   buyp5   ActPr   TotalBuyQty TotalSellQty    
ALANKIT 2018-01-12 13:02:06 2018-01-12 13:00:50 78.10   759.00  100.00  996.00  1287.00 200 15.00   300.00  100.00  1787.00 5614.00 78.25   78.35   78.40   78.45   78.50   78.10   78.05   78.00   77.80   77.75   78.25   63928   194206  
ALANKIT 2018-01-12 13:32:29 2018-01-12 13:22:21 79.50   28.00   100.00  200.00  1288.00 248 50.00   178.00  898.00  100.00  487.00  79.50   79.55   79.60   79.65   79.75   79.30   79.15   79.10   79.05   78.80   79.20   61927   175983  
ALANKIT 2018-01-12 13:36:26 2018-01-12 13:34:51 79.20   39.00   3649.00 1287.00 7.00    11  1500.00 1024.00 1000.00 220.00  65.00   79.20   79.25   79.50   79.55   79.60   79.15   79.00   78.85   78.65   78.55   79.00   65503   176990  
ALANKIT 2018-01-12 14:32:29 2018-01-12 14:31:23 78.80   810.00  1000.00 1287.00 1342.00 555 58.00   20.00   100.00  10.00   1250.00 78.80   78.85   78.90   78.95   79.00   78.70   78.60   78.55   78.50   78.30   78.70   84405   184759  
ALANKIT 2018-01-12 14:12:58 2018-01-12 14:11:22 78.50   1.00    5.00    100.00  25.00   510 2542.00 25.00   95.00   50.00   500.00  78.50   78.55   78.60   78.85   78.90   78.30   78.25   78.20   78.15   78.10   78.85   74505   189866  
APEX    2018-03-05 14:14:30 2018-03-05 14:13:23 72.00   51.00   71.00   20.00   150 1.00    1.00    14.00   20.00   1108.00 690.00  690.15  690.80  690.95  691.00  689.60  689.55  689.45  689.15  689.00  0   35535   61963   690.00
APEX    2018-01-31 11:52:11 2018-01-31 11:50:48 100.00  10.00   10.00   15.00   50  50.00   50.00   10.00   16.00   67.00   621.15  621.20  621.40  621.80  621.95  619.50  619.00  618.00  617.00  616.50  0   8083    25609   619.50
APEX    2018-01-31 11:56:14 2018-01-31 11:54:48 38.00   29.00   67.00   174.00  124 53.00   50.00   50.00   16.00   25.00   625.00  625.40  625.45  626.00  626.90  623.95  623.90  623.50  623.45  623.00  0   12587   23399   624.00
APEX    2018-01-18 09:36:03 2018-01-18 09:35:14 38.00   46.00   67.00   226.00  6   5.00    50.00   36.00   20.00   30.00   781.00  781.80  781.85  781.95  782.00  780.20  780.15  780.05  780.00  779.95  782.70  17023   21946   780.75
APEX    2018-01-18 09:44:16 2018-01-18 09:42:15 47.00   50.00   25.00   67.00   2887    25.00   8.00    58.00   5.00    50.00   791.60  791.65  791.95  792.30  792.65  790.20  790.15  790.00  789.05  789.00  791.45  22314   26007   790.05
STRTECH 2018-01-19 14:57:51 2018-01-19 14:56:24 68.50   1.00    5.00    2.00    3   3.00    20.00   3.00    5.00    10.00   2484.95 2485.00 2489.00 2489.90 2490.00 2477.55 2477.50 2477.20 2477.05 2476.70 2480.60 32408   8565    2485.00
STRTECH 2018-01-25 10:50:10 2018-01-25 10:47:46 32.65   1.00    511.00  1.00    12  9.00    5.00    100.00  23.00   20.00   2484.60 2484.70 2484.80 2485.00 2486.00 2480.15 2480.10 2480.00 2475.00 2471.15 2534.60 28306   18002   2484.70

Within the Same Scrip And the Same Date (from the field Timestamp1), I would like to query all the records and return records which Satisfy 2 complex conditions.

These conditions are:
a)The NSEPr value should be at least 3.5 % Higher than the First value of NSEPr for That DAY (Day can be extracted from Timetamp1 here)
b)The Sum of Values for SellQ1 + SellQ2.. (tillSell 5) should be 3 times (or Higher than the Sum of Values for BuyQ1 + BuyQ2.. (tillBuyQ5).

I managed to extract the Date from timestamp1 using df['mydt'] = df.Timestamp1.dt.date..
I tried achieving the above task using for loop with df.iterrows(), i.e. iterating across the Df. This failed due to an endless loop..

I remember the above is achievable using df.groupby['Scrip','mydt'].apply Or perhaps by using df.groupby['scrip','mydt'].apply(lambda x

However I am not able to find the solution to this. I will really appreciate some help on the above.

TIA.

How do you want the output to look like? Is this really a query which is executed based on user input? Are both given, the date and the Scrip? Or do you want to list all records, which sattisfy this condition (all dates and scrip values)? — jottbe, Aug 18 '19 at 06:41
The Output should be.. Like: List all records which Fulfill the 2 conditions — Suraj_j, Aug 18 '19 at 06:59
Is it possible that there are no such rows in the data above? — jottbe, Aug 18 '19 at 07:11
Is it maybe vice versa, that you want to have the first value higher than the current value? — jottbe, Aug 18 '19 at 07:22

jottbe · Accepted Answer · 2019-08-18T22:25:25.253

It would look like:

# get the first values per scrip and day
df_a_first_vals= df.groupby([df['Timestamp1'].dt.date, df['Scrip']]).agg({'NSEPr': 'first'})

# create an indexer for condition b and extract the
# corresponding data with the date stored in a separate
# column
df_b_indexer= df[['Sellq1', 'Sellq2', 'Sellq3', 'Sellq4', 'Sellq5']].sum(axis='columns') >= df[['Buyq1', 'Buyq2', 'Buyq3', 'Buyq4', 'Buyq5']].sum(axis='columns')*3
df_b_data= df[df_b_indexer].copy(deep=True)
df_b_data['Timestamp1_date']= df_b_data['Timestamp1'].dt.date

# merge a and b to apply condition a
df_ab_merged= df_b_data.merge(df_a_first_vals, left_on=['Timestamp1_date', 'Scrip'], right_index=True, suffixes=['', '_first'])

# output the result
df_ab_merged[df_ab_merged['NSEPr']>=df_ab_merged['NSEPr_first']*1.035]

It seems your data does not contain such a record, so I just changed the NSEPr value for (APEX, 2018-01-31T11:52:11) from 100.00 to 20.00. Then the logic above outputs the second row of that day:

Out[148]: 
  Scrip          Timestamp1               NSETS  NSEPr  ...  TotalBuyQty  TotalSellQty  Timestamp1_date  NSEPr_first
7  APEX 2018-01-31 11:56:14 2018-01-31 11:54:48   38.0  ...        23399         624.0       2018-01-31         20.0

[1 rows x 29 columns]

Btw, if your data is really large and you want to avoid the deep copy above, you could just store the date part of Timestamp1 as a separate column.

Testdata (I just manually changed the second-last record, so it conforms the condition):

raw="""Scrip   Timestamp1          NSETS               NSEPr Buyq1 Buyq2   Buyq3   Buyq4   Buyq5   Sellq1  Sellq2  Sellq3  Sellq4  Sellq5  Sellp1  Sellp2  Sellp3  Sellp4  Sellp5  buyp1   buyp2   buyp3   buyp4   buyp5   ActPr   TotalBuyQty TotalSellQty    
ALANKIT 2018-01-12T13:02:06 2018-01-12T13:00:50 78.10   759.00  100.00  996.00  1287.00 200 15.00   300.00  100.00  1787.00 5614.00 78.25   78.35   78.40   78.45   78.50   78.10   78.05   78.00   77.80   77.75   78.25   63928   194206  
ALANKIT 2018-01-12T13:32:29 2018-01-12T13:22:21 79.50   28.00   100.00  200.00  1288.00 248 50.00   178.00  898.00  100.00  487.00  79.50   79.55   79.60   79.65   79.75   79.30   79.15   79.10   79.05   78.80   79.20   61927   175983  
ALANKIT 2018-01-12T13:36:26 2018-01-12T13:34:51 79.20   39.00   3649.00 1287.00 7.00    11  1500.00 1024.00 1000.00 220.00  65.00   79.20   79.25   79.50   79.55   79.60   79.15   79.00   78.85   78.65   78.55   79.00   65503   176990  
ALANKIT 2018-01-12T14:32:29 2018-01-12T14:31:23 78.80   810.00  1000.00 1287.00 1342.00 555 58.00   20.00   100.00  10.00   1250.00 78.80   78.85   78.90   78.95   79.00   78.70   78.60   78.55   78.50   78.30   78.70   84405   184759  
ALANKIT 2018-01-12T14:12:58 2018-01-12T14:11:22 78.50   1.00    5.00    100.00  25.00   510 2542.00 25.00   95.00   50.00   500.00  78.50   78.55   78.60   78.85   78.90   78.30   78.25   78.20   78.15   78.10   78.85   74505   189866  
APEX    2018-03-05T14:14:30 2018-03-05T14:13:23 72.00   51.00   71.00   20.00   150 1.00    1.00    14.00   20.00   1108.00 690.00  690.15  690.80  690.95  691.00  689.60  689.55  689.45  689.15  689.00  0   35535   61963   690.00
APEX    2018-01-31T11:52:11 2018-01-31T11:50:48 20.00   10.00   10.00   15.00   50  50.00   50.00   10.00   16.00   67.00   621.15  621.20  621.40  621.80  621.95  619.50  619.00  618.00  617.00  616.50  0   8083    25609   619.50
APEX    2018-01-31T11:56:14 2018-01-31T11:54:48 38.00   29.00   67.00   174.00  124 53.00   50.00   50.00   16.00   25.00   625.00  625.40  625.45  626.00  626.90  623.95  623.90  623.50  623.45  623.00  0   12587   23399   624.00
APEX    2018-01-18T09:36:03 2018-01-18T09:35:14 38.00   46.00   67.00   226.00  6   5.00    50.00   36.00   20.00   30.00   781.00  781.80  781.85  781.95  782.00  780.20  780.15  780.05  780.00  779.95  782.70  17023   21946   780.75
APEX    2018-01-18T09:44:16 2018-01-18T09:42:15 47.00   50.00   25.00   67.00   2887    25.00   8.00    58.00   5.00    50.00   791.60  791.65  791.95  792.30  792.65  790.20  790.15  790.00  789.05  789.00  791.45  22314   26007   790.05
STRTECH 2018-01-19T14:57:51 2018-01-19T14:56:24 20.50   1.00    5.00    2.00    3   3.00    20.00   3.00    5.00    10.00   2484.95 2485.00 2489.00 2489.90 2490.00 2477.55 2477.50 2477.20 2477.05 2476.70 2480.60 32408   8565    2485.00
STRTECH 2018-01-19T15:50:10 2018-01-25T10:47:46 32.65   1.00    511.00  1.00    12  9.00    5.00    100.00  23.00   20.00   2484.60 2484.70 2484.80 2485.00 2486.00 2480.15 2480.10 2480.00 2475.00 2471.15 2534.60 28306   18002   2484.70"""

df= pd.read_csv(io.StringIO(raw), sep='\s+', parse_dates=['Timestamp1', 'NSETS'], index_col=None)

Result:

Out[212]: 
      Scrip          Timestamp1               NSETS  NSEPr  ...  TotalBuyQty  TotalSellQty  Timestamp1_date  NSEPr_first
11  STRTECH 2018-01-19 15:50:10 2018-01-25 10:47:46  32.65  ...        18002        2484.7       2018-01-19         20.5

[1 rows x 29 columns]

Resulting row seems to be incorrect. SellQ1 + SellQ2.. (tillSell 5) for this tow is 447 & BuyQ1 + BuyQ (tillBuyQ5) is 776 so ,sell quantity is not 3 times (or Higher) than the buy quantity. Or am i missing something? — moys, Aug 18 '19 at 07:51
Agree with Mohanys, I tried this with My data only the NSEPr increase condition appears to be fulfilled the Other one with Sum of Buyq and SellQ comparison is not Met. — Suraj_j, Aug 18 '19 at 08:01
Thank you for the check! Probably you already found the error yourself. I multiplicated the three on the wrong side. — jottbe, Aug 18 '19 at 22:26

score 0 · Answer 2 · answered Aug 18 '19 at 10:26

Check if this works for you First we group by Scrip & TimeStamp1

grouped = df.groupby(['Scrip','Timestamp1'])

Now we take the grouped dataframe & check what rows met our conditions. Rows meeting the price condition can be obtained as follows

price_condition=[]
for g_idx, group in grouped:
    for row_idx, row in group.iterrows():       
        if (row.NSEPr > (group.NSEPr.values[0]*1.035)) :
            price_condition.append(row_idx)
        else:
            pass
df.iloc[price_condition]

Rows meeting the Quantity condition can be obtained as follows (only 2 quantities are used in this code)

quantity_condition=[]
for g_idx, group in grouped:
    for row_idx, row in group.iterrows():
        if ((row.Sellq1+row.Sellq2) > (3*(row.Buyq1 + row.Buyq2))) :
            quantity_condition.append(row_idx)
        else:
            pass
df.iloc[quantity_condition]

Now the rows meeting both conditions could be obtained as follows

pnq_condition=[]
for g_idx, group in grouped:
    for row_idx, row in group.iterrows():  
        if (((row.Sellq1+row.Sellq2+row.Sellq3+row.Sellq4++row.Sellq5) > 
        (3*(row.Buyq1 + row.Buyq2+ row.Buyq3+ row.Buyq4+ row.Buyq5))) 
        and (row.NSEPr > (group.NSEPr.values[0]*1.035))) :
            pnq_condition.append(row_idx)
        else:
            pass
df.iloc[price_condition]

I was able to check the values that meet Price condition & Quantity condition separately. However, in the data you provided, there are no rows that meet both conditions. So, check on your complete data & let's know if this code works for you.

From what I have read Online this is achievable using df.groupby['Scrip','mydt'].apply Or perhaps by using df.groupby['scrip','mydt'].apply(lambda x.. Mydt is the DatePart from Timestamp field. — Suraj_j, Aug 18 '19 at 18:16
I don't think this can be done with apply. You can have the last snippet as a function & apply that on the dataframe but it is essentially the same. Why is apply so much desired? — moys, Aug 19 '19 at 09:12
.apply is preferred as that is a cleaner and more efficient way to achieve same results. It can be achieved with apply with a custom function after creating the variable df_a_first_vals. — Suraj_j, Aug 21 '19 at 12:27

Pandas:All results which fulfill multiple Conditions within a Categories

2 Answers2