Calculate the difference of list elements if more than one element in the list is not zero

Question

have a data set which looks like this:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit   
10/11       A,D          5            0         0          12
11/11       A,B,C       10            10        5          0
12/11       A           20             0        0           0

i want the output column so that whenever there are more than one element in the list, it will calculate the difference of the unit, and when single element is present it will display zero. so output will be:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit          output
    10/11       A,D          5            0         0          12           5-12=-7 
    11/11       A,B,C       10            10        5          0            10-10-5=-5
    12/11       A           20             0        0           0            0--since only one element is there

can anyone please tell me how to get the output column.

There always are 0 for values if values not in `Item` per rows? — jezrael, Nov 20 '19 at 11:33

jezrael · Answer 1 · 2019-11-20T11:55:40.087

0

Solution working with no check Item column - it use first non 0 value per Unit columns and subtract by sum of values, also if only 1 value it set 0:

#all columns without first and second
df1 = df.iloc[:, 2:].mask(lambda x: x==0)
#alternative
#all columns with Unit in column names
#df1 = df.filter(like='Unit').mask(lambda x: x==0)
first = df1.bfill(axis=1).iloc[:, 0]
df['output'] = np.where(df1.count(axis=1) == 1, 0, first - df1.sum(axis=1) + first)
print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12    -7.0
1  11/11  A,B,C      10      10       5       0    -5.0
2  12/11      A      20       0       0       0     0.0

Solution with match by Item column - explode Item to rows, multiple by -1 and 0 if only one value and last aggregate sum, first and join:

df = df.assign(Item = df['Item'].str.split(',')).explode('Item').reset_index(drop=True)
df['new'] = df.lookup(df.index, df['Item'] + '.Unit')

df.loc[df.duplicated(subset=['Date']), 'new'] *=  -1
df.loc[~df.duplicated(subset=['Date'], keep=False), 'new'] =  0


d1 = dict.fromkeys(df.columns.difference(['Date','Item','new']), 'first')
fin = {**{'Item':','.join}, **d1, **{'new':'sum'}}
df = df.groupby('Date', as_index=False).agg(fin)

print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  new
0  10/11    A,D       5       0       0      12   -7
1  11/11  A,B,C      10      10       5       0   -5
2  12/11      A      20       0       0       0    0

edited Nov 20 '19 at 11:55

answered Nov 20 '19 at 11:24

jezrael

822,522
95
1,334
1,252

Which package do we need to import for explode? – SS25 Nov 20 '19 at 11:34
@SinghSonu - only upgrade pandas to 0.25+ – jezrael Nov 20 '19 at 11:34
i am using jupyter notebook – SS25 Nov 20 '19 at 11:35
@SinghSonu - added new simplier solution, can you check it? – jezrael Nov 20 '19 at 11:48
for Explode which module do i need to import i am running the code in jupyter notebook -jezrael – SS25 Nov 20 '19 at 11:51
@SinghSonu - for explode is necessary upgrade pandas, try in conda cmd `conda update pandas` for last version – jezrael Nov 20 '19 at 11:53
@SinghSonu - check [this](https://stackoverflow.com/questions/22840449/how-to-update-pandas-from-anaconda-and-is-it-possible-to-use-eclipse-with-this-l) – jezrael Nov 20 '19 at 11:53

score 0 · Answer 2 · answered Nov 20 '19 at 11:25

0

Here is one solution. The first step is to create a function that does exactly what you want on one specific row:

from functool import reduce
def sum_function(x):
  if len(x[x != 0]) == 1:
    return 0
  else:
    return reduce(lambda a,b: a-b, x)

If there is only one element in the row that is not 0, then return 0. If there are more elements then subtract them all. And here is how you can apply that function to every row:

columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit']
df.apply(lambda x: sum_function(x[columns]), axis=1)

The result is:

0   -7
1   -5
2    0

And you could add that as a new column:

df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1)

answered Nov 20 '19 at 11:25

L3n95

1,505
3
25
49

What if i have more than 4 columns say 15 columns? what will be the change columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit'] df.apply(lambda x: sum_function(x[columns]), axis=1) in this – SS25 Nov 20 '19 at 11:30
Yes just change `columns` to the columns you need. For example if all the columns you need have 'Unit' in it, you could use: `columns = df.columns[df.columns.str.contains('Unit')]` instead of listing them all by hand. – L3n95 Nov 20 '19 at 11:35
unsupported operand type(s) for -: 'str' and 'int'", 'occurred at index 0').getting this error while executing @L3n95 – SS25 Nov 20 '19 at 11:45
@SinghSonu At executing what? Are your "numbers" strings actually? – L3n95 Nov 20 '19 at 11:46
No they are int-L3n95 – SS25 Nov 20 '19 at 11:47
@SinghSonu in which line does it fail? Try `reduce(lambda a,b: int(a)-int(b), x)` If that works, then one of your numbers is a `str` and not a `int` – L3n95 Nov 20 '19 at 12:18
df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1) failing in this line-L3n95 – SS25 Nov 20 '19 at 12:24
in which line i have to apply reduce(lambda a,b: int(a)-int(b), x)? - L3n95 – SS25 Nov 20 '19 at 12:27
@SinghSonu replace the `return reduce(lambda a,b: a-b, x)` with `return reduce(lambda a,b: int(a)-int(b), x)` – L3n95 Nov 20 '19 at 12:32
"invalid literal for int() with base 10: 'A '", 'occurred at index 0')-getting this error – SS25 Nov 20 '19 at 12:33
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202722/discussion-between-l3n95-and-singh-sonu). – L3n95 Nov 20 '19 at 12:46

luigigi · Answer 3 · 2019-11-20T12:22:08.337

0

Try:

def calc(row):
    out = row[np.argmax(np.array(row.tolist()) > 0)]
    for c in row.values[np.argmax(np.array(row.tolist()) > 0)+1:]:
        out -= c
    if out == row.sum():
        return 0
    else:
        return out

df['output'] = df.drop(['Date','Item'], axis=1).apply(calc, axis=1)

Output:

    Date   Item  A.unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12      -7
1  11/11  A,B,C      10      10       5       0      -5
2  12/11      A      20       0       0       0       0

edited Nov 20 '19 at 12:22

answered Nov 20 '19 at 11:28

luigigi

4,146
1
13
30

@jezrael yes you are right, I edited my answer. thanks for pointing that out – luigigi Nov 20 '19 at 12:16

score 0 · Answer 4 · answered Nov 20 '19 at 12:25

Another solution using lambda, regex is

unit_columns = list(df.columns[2:])
regex = re.compile(re.escape('.Unit'), re.IGNORECASE)
unit_columns_replaced = [regex.sub('', a) for a in unit_columns]

def output(row):
    ItemN = row['Item'].split(",")
    if len(ItemN) < 2:
        return 0
    idxs = np.where(np.in1d(unit_columns_replaced, ItemN))[0]
    c_names = [unit_columns[idx] for idx in idxs]
    f_columns = row.filter(items=c_names)
    return 2 * f_columns[0] - f_columns.sum()


df['output'] = df.apply(lambda row: output(row), axis=1)
df

which gives output as

    Date    Item    A.unit  B.Unit  C.Unit  D.Unit  output
0   10/11   A,D     5   0   0   12  -7
1   11/11   A,B,C   10  10  5   0   -5
2   12/11   A   20  0   0   0   0

Calculate the difference of list elements if more than one element in the list is not zero

4 Answers4