2

I have a pivot table with multiple columns of data in a time series:

             A    B    C    D
11/1/2018    1    5    5    7
11/2/2018    2    6    6    8
11/3/2018    3    7    7    9

The values in the data columns are not important for this example. I would like to subtract the value in the "key" column (column A in this case) from a subset of columns: B & C in this case. I would then like to drop any columns not in the subset or the key column. Result would be:

             A    B    C
11/1/2018    1    4    4
11/2/2018    2    4    4
11/3/2018    3    4    4

I have subtracted columns in the past via code like this:

df['dif'] = df['B'] -df['A']

But this will add the "dif" column. I would like to replace column B with B-A values. Also, instead of passing the instructions one at a time (B-A, C-A), would like to pass the list something like "if column in list, subtract key column, else drop column."

Thanks

thesimplevoodoo
  • 153
  • 3
  • 11

2 Answers2

1

You can use apply to substract A from the subset columns that you choose and finally join again with A.

df['A'].to_frame().join(df[['B','C']].apply(lambda x: x - df['A']))

            A   B   C
11/1/2018   1   4   4
11/2/2018   2   4   4
11/3/2018   3   4   4
Franco Piccolo
  • 6,845
  • 8
  • 34
  • 52
  • thanks Franco. I appreciate your answer - just trying to determine which of these answers is **more** helpful/insightful – thesimplevoodoo Nov 08 '18 at 15:22
  • Welcome, I think both answers get the job done and are very similar, apply is a more general way and sub is a more specific way. Understanding apply will be more useful for you as you will be able to use it in more occasions, but if you are looking the perfect answer should be sub I guess.. – Franco Piccolo Nov 08 '18 at 15:26
  • can i ask, how would you change this answer if you didn't want to drop the columns not in your subset? and what if the subset was actually a list larger than whats in the df? EG (B,C,D,E) like if column in subset **and** df, then apply lambda, else leave column as is – thesimplevoodoo Nov 08 '18 at 18:35
1

pandas.DataFrame.sub with axis=0

When subtracting a Series from a DataFrame Pandas will align the columns of the DataFrame with the index of the Series by default. This is what happens when you use the - operator. However, when you use the pandas.DataFrame.sub method, you can override that default and specify that the DataFrame should align its index with the index of the Series.

def f(d, key, subset):
  return d[[key]].join(d[subset].sub(d[key], axis=0))

f(df, 'A', ['B', 'C'])

           A  B  C
11/1/2018  1  4  4
11/2/2018  2  4  4
11/3/2018  3  4  4
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • i'm a little confused here. it seems as tho you say align df w series index **by default**, and then **specify** that df align w index of series. these seem the same to me except you gain the **option to choose** how alignment happens. am i correctly grasping the significance of your statement? – thesimplevoodoo Nov 08 '18 at 15:17
  • I said the index attribute of the Series object aligns with the columns attribute of the DataFrame object by default. Use the `sub` method to change that default to make the index attribute of the Series object align with the index attribute of the DataFrame object. – piRSquared Nov 08 '18 at 15:23
  • does this imply the right date may not be subtracted from the right date? – thesimplevoodoo Nov 08 '18 at 16:00
  • No, this will do the correct thing. I'm debating whether to offer up a definitive question and answer to make the point. – piRSquared Nov 08 '18 at 16:02
  • That would be great - please let me know if you do - it seems as though you are saying both methods will provide a correct answer, so I'd like to understand what the risk is here. Trying to understand whats best both stylistically and why/when the relationship between the different attributes is important – thesimplevoodoo Nov 08 '18 at 16:23