add selected columns from two pandas dfs

Question

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

root · Answer 1 · 2018-05-02T22:41:35.433

2

Use filter:

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df, you could instead just drop the columns in absum_df, if you want to add all columns from a_df not in absum_df:

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)

edited May 02 '18 at 22:41

answered May 02 '18 at 22:04

root

32,715
6
74
87

The OP asked to "leave ID, and atext alone". Presumably, that means keep those columns from `a_df`. Your answer simply gets rid of those columns. – Acccumulation May 02 '18 at 22:18
right. it works, but forgets about my ID and atext. thank you for noting that. – profhoff May 02 '18 at 22:18
see the edit for adding additional columns after the fact – root May 02 '18 at 22:42

Acccumulation · Accepted Answer · 2018-05-02T22:44:14.330

1

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])

edited May 02 '18 at 22:44

answered May 02 '18 at 22:15

Acccumulation

3,491
1
8
12

so this is something always driving me crazy in python - suppose the var columns have names var1, var2, var3, ...,varn. HOW do you tell python this in a shorthand without typing all of them one by one? in the example above with "var_columns" is there a way to use some kind of shorthand instead of typing each column? sometimes there are HUNDREDS of cols labeled sequentially and it seems so inefficient to brute force type them! – profhoff May 02 '18 at 22:23
In the example from @root, I was able to use the like="var" and it perfectly summed all the corresponding "var" columns. But in the second example, I will have to type all the col names out. – profhoff May 02 '18 at 22:25
@profhoff I forgot to put the `format` in my code. See my edited version. – Acccumulation May 02 '18 at 22:43
Instead of using a list comprehension with string formatting to generate the columns, you can just do `var_columns = a_df.filter(like='var').columns` – root May 02 '18 at 22:57

add selected columns from two pandas dfs

2 Answers2