I find my self trying to analyse a data set and find how some variables correlate.
I need to add a loop that adds a logical test to the if statement:
Edited: Example: Take this data frame as example
In [11]: df
Out[11]:
INPUT1 INPUT2 INPUT3 ... OUTPUT
0 8 5 6 ... 1
1 3 2 5 ... 0
2 3 1 5 ... 1
3 1 2 5 ... 0
4 4 3 5 ... 0
I'm testing the combinations of inputs to check how they match the output
def greater_than(a,b):
return a > b
def greater_equal_than(a,b):
return a >= b
def lower_equal_than(a,b):
return a <= b
def lower_than(a,b):
return a < b
def equal(a,b):
return a == b
operation = { '>': greater_than, '>=': greater_equal_than, '<=': lower_equal_than, '<': lower_than }
escenario = pd.DataFrame(columns=['esc','pf'])
for i in range(len(names)):
for j in names[i+1:]:
for op in operation:
escenario['esc'] = df.apply(lambda x : 1 if operation[op]( names[i], j ) else 0, axis=1)
escenario['pf'] = df['OUTPUT']
match = escenario.apply(lambda x : 1 if x['pf'] == 1 and x['pf'] == x['esc'] else 0, axis=1 )
percent_match = (100 * match.sum())/escenario['pf'].sum()
percent_no_match = (100 *(escenario['esc'].sum() - match.sum())) / escenario['esc'].sum()
print( f"{names[i]} {op} {j} -> { percent_match } / {percent_no_match} " )
I need to check all the combinations of input combinations that keeps percent_match closer to a 100% and percent_no_match closer to 0%
for example:
first iteration:
INPUT2 < INPUT3
SECOND INTERATION
INPUT2 < INPUT3 and INPUT1 > INPUT2
Right now I'm running the code, sorting the print and getting the couple where the match is closer to 100 and the modifying the code to add the match, Example:
First run better output is INPUT2 < INPUT3
Then I modify this line:
escenario['esc'] = df.apply(lambda x : 1 if operation[op]( names[i], j ) else 0, axis=1)
to add the first output, like:
escenario['esc'] = df.apply(lambda x : 1 if df['INPUT2'] < DF['INPUT3'] and operation[op]( names[i], j ) else 0, axis=1)
and check again... This last part is the one I want to automate through a loop. Thanks