This post covered Modification of a function to return a dataframe with specified values and I would like to further modify the output. The current function and vectorized version will get all combinations of columns subtracted from each other and return relevant data accordingly.
Example and test data:
import pandas as pd
import numpy as np
from itertools import combinations
df2 = pd.DataFrame(
{'AAA' : [80,5,6],
'BBB' : [85,20,30],
'CCC' : [100,50,25],
'DDD' : [98,50,25],
'EEE' : [103,50,25],
'FFF' : [105,50,25],
'GGG' : [109,50,25]});
df2
AAA BBB CCC DDD EEE FFF GGG
0 80 85 100 98 103 105 109
1 5 20 50 50 50 50 50
2 6 30 25 25 25 25 25
v = df2.values
df3 = df2.mask((np.abs(v[:, :, None] - v[:, None]) <= 5).sum(-1) <= 1)
df3
AAA BBB CCC DDD EEE FFF GGG
0 80.0 85.0 100 98 103 105 109
1 NaN NaN 50 50 50 50 50
2 NaN 30.0 25 25 25 25 25
All values within thresh (5 here) are returned on a per row basis with np.abs <=5
.
What needs to change?
On the first row of df3
there are two clusters of values within thresh (80,85) and (100,98,103,105,109). They are all valid but are two separate groups as not within thresh
. I would like to be able to separate these values based on another thresh
value.
I have attempted to demonstrate what I am looking to do with the following (flawed) code and only including this to show that Im attempting to progress this myself..
df3.mask(df3.apply(lambda x : x >= df3.T.max() \
- (thresh * 3))).dropna(thresh=2).dropna(axis=1)
AAA BBB
0 80.0 85.0
df3.mask(~df3.apply(lambda x : x >= df3.T.max() - (thresh * 3))).dropna(axis=1)
CCC DDD EEE FFF GGG
0 100 98 103 105 109
1 50 50 50 50 50
2 25 25 25 25 25
So my output is nice (and shows close to desired output) but the way I got this is not so nice...
---Desired output: ---
I have used multiple rows to demonstrate but when I use this code it will only be one row that needs to be output and split. So desired output is to return the separate columns as per this example for row 0
.
CCC DDD EEE FFF GGG
0 100 98 103 105 109
and
AAA BBB
0 80.0 85.0