I have data as follows:
+----+--------+------------------+
| Id | Weight | is_weight_faulty |
+----+--------+------------------+
| A | 100 | 1 |
| A | 50 | 0 |
| A | 10 | 0 |
| B | 500 | 0 |
| B | 200 | 0 |
| B | 40 | 0 |
| C | 100 | 0 |
+----+--------+------------------+
I need to create a column correct weight which depends on two factors Id
and value is_weight_faulty
. First I will apply partition by Id
and order by is_weight_faulty
. If for any subset we find value 1
in column is_weight_faulty
our correct weight would be Weight
= Correct_Weight
where is_weight_faulty = 1
and for remaining entries of that subset Correct_Weight
would be 0.
If we look at first three rows for Id
A Correct_Weight
would equal 100 in 1st row while for next 2 rows it would be 0. For remaining cases Correct_Weight
would equal Weight
as there is no record of is_weight_faulty = 1
against each subset of data that we would get after applying partition by.
Desired Output
+----+--------+------------------+----------------+
| Id | Weight | is_weight_faulty | Correct_Weight |
+----+--------+------------------+----------------+
| A | 100 | 1 | 100 |
| A | 50 | 0 | 0 |
| A | 10 | 0 | 0 |
| B | 500 | 0 | 500 |
| B | 200 | 0 | 200 |
| B | 40 | 0 | 40 |
| C | 100 | 0 | 100 |
+----+--------+------------------+----------------+
I am unable to create cases that would make Correct_Weight
0 in case if there is an entry of is_weight_faulty = 1