0

Full question-

Search each column in a DataFrame to determine when the first instance of a value greater than a value stored in the last row of each column in the DataFrame and output the index

Ex. of df.head():

   Well               A1          A2          A3          A4           
Temperature                                                               
    25.0         371.335253  360.026443  253.228769  593.436104     
    25.2         331.957145  332.224668  233.607595  561.057715    
    25.4         305.472591  303.777874  213.500582  535.310186   
    25.6         285.713623  274.069361  202.024427  515.261876    
    25.8         252.716374  254.610848  181.719415  488.988468    

Ex. of df.tail():

Well                       A1          A2           A3           A4
Temperature
 94.79                -441.775980 -664.549239  1060.674188  1158.481056   
 94.99                -492.189733 -709.521424  1029.628209  1087.625128   
 mean                  280.759521  283.417750   201.471571   519.939366   
 std                    72.404373   69.023406    45.447202    58.150127   
 4*std                 570.377014  559.511373   383.260378   752.539875   

I want to use the value of 4*std in A1 (570.37) and search starting from the top of the column for the first value greater than (570.37) in A1 and output the Temperature. I need to repeat this for all columns.

I would like the output as a new dataframe like this example below...I'm lost on how to construct this?

Well   Temp
A1     26.0
A2     27.6
A3     26.8
...    ...
H12    27.2

I would appreciate any help!

schnick
  • 25
  • 6
  • You example doesn't quite make sense to me. Can you please explain your logic? – Allen Qin Mar 05 '18 at 06:06
  • Each column contains a dataset, with data points (intensities) measured at different temperatures (Temperature is set as the index). At the end of each column I have calculated different things like an average of a specific temperature range. I want to use a value at the bottom of each column to search from the top of the column for when the intensity values become larger than the value from the bottom and output the corresponding Temperature. I know it's confusing, does this help? – schnick Mar 05 '18 at 06:19
  • Your example output doesn't seem to match your logic. It'll be helpful if you give a working example. – Allen Qin Mar 05 '18 at 06:25
  • I added a little more specific examples, so let me know if it's still confusing. Thanks! – schnick Mar 05 '18 at 06:46
  • @Schnick - Is possible some value in some column is not greater as last row? – jezrael Mar 05 '18 at 06:55
  • @jezrael No every column should have a value that's greater than the last row value...I'm testing it out now. Actually that might be useful to add though... – schnick Mar 05 '18 at 07:08

1 Answers1

0

I believe you need if greater value for each column always exist:

print (df)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  632.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
mean               280.759521  283.417750   201.471571   519.939366
std                 72.404373   69.023406    45.447202    58.150127
4*std              570.377014  559.511373   383.260378   752.539875


df1 = df.iloc[:-3].gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df1)
  Well   Temp
0   A1   25.4
1   A2   25.2
2   A3  94.79
3   A4  94.79

Details:

print (df.iloc[:-3].gt(df.iloc[-1]))
                     A1     A2     A3     A4
Well Temperature                            
25.0              False  False  False  False
25.2              False   True  False  False
25.4               True  False  False  False
25.6              False  False  False  False
25.8              False  False  False  False
94.79             False  False   True   True
94.99             False  False   True   True

print (df.iloc[:-3].gt(df.iloc[-1]).idxmax())
A1     25.4
A2     25.2
A3    94.79
A4    94.79
dtype: object

If is possible some value is not greater one possible solution is add new row to the end with NaN index:

print (df)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  332.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
mean               280.759521  283.417750   201.471571   519.939366
std                 72.404373   69.023406    45.447202    58.150127
4*std              570.377014  559.511373   383.260378   752.539875
df1 = df.iloc[:-3].append((df.iloc[-1] + 1).rename(np.nan))
print (df1)
                           A1          A2           A3           A4
Well Temperature                                                   
25.0               371.335253  360.026443   253.228769   593.436104
25.2               331.957145  332.224668   233.607595   561.057715
25.4              3005.472591  303.777874   213.500582   535.310186
25.6               285.713623  274.069361   202.024427   515.261876
25.8               252.716374  254.610848   181.719415   488.988468
94.79             -441.775980 -664.549239  1060.674188  1158.481056
94.99             -492.189733 -709.521424  1029.628209  1087.625128
NaN                571.377014  560.511373   384.260378   753.539875

df2 = df1.gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df2)
  Well   Temp
0   A1   25.4
1   A2    NaN
2   A3  94.79
3   A4  94.79

print (df1.gt(df.iloc[-1]))
                     A1     A2     A3     A4
Well Temperature                            
25.0              False  False  False  False
25.2              False  False  False  False
25.4               True  False  False  False
25.6              False  False  False  False
25.8              False  False  False  False
94.79             False  False   True   True
94.99             False  False   True   True
NaN                True   True   True   True
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This works great...will let you if there's any problems when I test the second part you suggested with NaN. Thanks! – schnick Mar 05 '18 at 07:30