0

have a data set which looks like this:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit   
10/11       A,D          5            0         0          12
11/11       A,B,C       10            10        5          0
12/11       A           20             0        0           0  

i want the output column so that whenever there are more than one element in the list, it will calculate the difference of the unit, and when single element is present it will display zero. so output will be:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit          output
    10/11       A,D          5            0         0          12           5-12=-7 
    11/11       A,B,C       10            10        5          0            10-10-5=-5
    12/11       A           20             0        0           0            0--since only one element is there

can anyone please tell me how to get the output column.

L3n95
  • 1,505
  • 3
  • 25
  • 49
SS25
  • 61
  • 7

4 Answers4

0

Solution working with no check Item column - it use first non 0 value per Unit columns and subtract by sum of values, also if only 1 value it set 0:

#all columns without first and second
df1 = df.iloc[:, 2:].mask(lambda x: x==0)
#alternative
#all columns with Unit in column names
#df1 = df.filter(like='Unit').mask(lambda x: x==0)
first = df1.bfill(axis=1).iloc[:, 0]
df['output'] = np.where(df1.count(axis=1) == 1, 0, first - df1.sum(axis=1) + first)
print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12    -7.0
1  11/11  A,B,C      10      10       5       0    -5.0
2  12/11      A      20       0       0       0     0.0

Solution with match by Item column - explode Item to rows, multiple by -1 and 0 if only one value and last aggregate sum, first and join:

df = df.assign(Item = df['Item'].str.split(',')).explode('Item').reset_index(drop=True)
df['new'] = df.lookup(df.index, df['Item'] + '.Unit')

df.loc[df.duplicated(subset=['Date']), 'new'] *=  -1
df.loc[~df.duplicated(subset=['Date'], keep=False), 'new'] =  0


d1 = dict.fromkeys(df.columns.difference(['Date','Item','new']), 'first')
fin = {**{'Item':','.join}, **d1, **{'new':'sum'}}
df = df.groupby('Date', as_index=False).agg(fin)

print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  new
0  10/11    A,D       5       0       0      12   -7
1  11/11  A,B,C      10      10       5       0   -5
2  12/11      A      20       0       0       0    0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Here is one solution. The first step is to create a function that does exactly what you want on one specific row:

from functool import reduce
def sum_function(x):
  if len(x[x != 0]) == 1:
    return 0
  else:
    return reduce(lambda a,b: a-b, x)

If there is only one element in the row that is not 0, then return 0. If there are more elements then subtract them all. And here is how you can apply that function to every row:

columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit']
df.apply(lambda x: sum_function(x[columns]), axis=1)

The result is:

0   -7
1   -5
2    0

And you could add that as a new column:

df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1)
L3n95
  • 1,505
  • 3
  • 25
  • 49
  • What if i have more than 4 columns say 15 columns? what will be the change columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit'] df.apply(lambda x: sum_function(x[columns]), axis=1) in this – SS25 Nov 20 '19 at 11:30
  • Yes just change `columns` to the columns you need. For example if all the columns you need have 'Unit' in it, you could use: `columns = df.columns[df.columns.str.contains('Unit')]` instead of listing them all by hand. – L3n95 Nov 20 '19 at 11:35
  • unsupported operand type(s) for -: 'str' and 'int'", 'occurred at index 0').getting this error while executing @L3n95 – SS25 Nov 20 '19 at 11:45
  • @SinghSonu At executing what? Are your "numbers" strings actually? – L3n95 Nov 20 '19 at 11:46
  • No they are int-L3n95 – SS25 Nov 20 '19 at 11:47
  • @SinghSonu in which line does it fail? Try `reduce(lambda a,b: int(a)-int(b), x)` If that works, then one of your numbers is a `str` and not a `int` – L3n95 Nov 20 '19 at 12:18
  • df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1) failing in this line-L3n95 – SS25 Nov 20 '19 at 12:24
  • in which line i have to apply reduce(lambda a,b: int(a)-int(b), x)? - L3n95 – SS25 Nov 20 '19 at 12:27
  • @SinghSonu replace the `return reduce(lambda a,b: a-b, x)` with `return reduce(lambda a,b: int(a)-int(b), x)` – L3n95 Nov 20 '19 at 12:32
  • "invalid literal for int() with base 10: 'A '", 'occurred at index 0')-getting this error – SS25 Nov 20 '19 at 12:33
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202722/discussion-between-l3n95-and-singh-sonu). – L3n95 Nov 20 '19 at 12:46
0

Try:

def calc(row):
    out = row[np.argmax(np.array(row.tolist()) > 0)]
    for c in row.values[np.argmax(np.array(row.tolist()) > 0)+1:]:
        out -= c
    if out == row.sum():
        return 0
    else:
        return out

df['output'] = df.drop(['Date','Item'], axis=1).apply(calc, axis=1)

Output:

    Date   Item  A.unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12      -7
1  11/11  A,B,C      10      10       5       0      -5
2  12/11      A      20       0       0       0       0
luigigi
  • 4,146
  • 1
  • 13
  • 30
0

Another solution using lambda, regex is

unit_columns = list(df.columns[2:])
regex = re.compile(re.escape('.Unit'), re.IGNORECASE)
unit_columns_replaced = [regex.sub('', a) for a in unit_columns]

def output(row):
    ItemN = row['Item'].split(",")
    if len(ItemN) < 2:
        return 0
    idxs = np.where(np.in1d(unit_columns_replaced, ItemN))[0]
    c_names = [unit_columns[idx] for idx in idxs]
    f_columns = row.filter(items=c_names)
    return 2 * f_columns[0] - f_columns.sum()


df['output'] = df.apply(lambda row: output(row), axis=1)
df

which gives output as

    Date    Item    A.unit  B.Unit  C.Unit  D.Unit  output
0   10/11   A,D     5   0   0   12  -7
1   11/11   A,B,C   10  10  5   0   -5
2   12/11   A   20  0   0   0   0
Prince Francis
  • 2,995
  • 1
  • 14
  • 22