I am dealing with some finanial data containing reversals. Reversals are basically a correction in a table that offsets another values in the table by adding in the table an equal amount of the opposite sign. My job is to clear those values. Take for example this dataframe:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2,7],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2 X
1 b 5 X
2 c 2 X
3 a 2 X
4 a 7 X
In this case row 3 is a reversal to row 0 and they have to be dropped. In the same time row 2 is not a reversal to row 0, though opposite values, because they do not match on column a. The result has to look as so.
a b xtra_col
0 b 5 X
1 c 2 X
2 a 7 X
The question is, how can I delete such reversals from my table? I have looked at drop_duplicates()
with subset as a and b, but that would not work, because it will only match same values, but not opposite.
I get the feeling that I might be able to achieve something with groupby
, but I am not sure how to organize it.
Additional note, it should also work in cases where there is an odd number of negative values. Considerig the case below, the output should be as follows:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2.0,-2],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2.0 X
1 b 5.0 X
2 c 2.0 X
3 a 2.0 X
4 a -2.0 X
output should be:
a b xtra_col
1 b 5.0 X
2 c 2.0 X
3 1 -2.0 X