1

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

profhoff
  • 1,017
  • 1
  • 13
  • 21

2 Answers2

2

Use filter:

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df, you could instead just drop the columns in absum_df, if you want to add all columns from a_df not in absum_df:

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)
root
  • 32,715
  • 6
  • 74
  • 87
1

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])
Acccumulation
  • 3,491
  • 1
  • 8
  • 12
  • so this is something always driving me crazy in python - suppose the var columns have names var1, var2, var3, ...,varn. HOW do you tell python this in a shorthand without typing all of them one by one? in the example above with "var_columns" is there a way to use some kind of shorthand instead of typing each column? sometimes there are HUNDREDS of cols labeled sequentially and it seems so inefficient to brute force type them! – profhoff May 02 '18 at 22:23
  • In the example from @root, I was able to use the like="var" and it perfectly summed all the corresponding "var" columns. But in the second example, I will have to type all the col names out. – profhoff May 02 '18 at 22:25
  • @profhoff I forgot to put the `format` in my code. See my edited version. – Acccumulation May 02 '18 at 22:43
  • Instead of using a list comprehension with string formatting to generate the columns, you can just do `var_columns = a_df.filter(like='var').columns` – root May 02 '18 at 22:57