Collapse overlapping coloums in pandas dataframe

Question

I have a data frame that looks like this:

>>> df = pd.DataFrame({'P1':['ARF5','NaN','NaN'],'P2':['NaN','M6PR','NaN'],'P3':['NaN','NaN','NDUFAF7']})
>>> df
     P1    P2       P3
0  ARF5   NaN      NaN
1   NaN  M6PR      NaN
2   NaN   NaN  NDUFAF7

I have been trying to collapse it down to something like this:

     C1
0  ARF5  
1  M6PR
2  NDUFAF7

All columns have an overlap but the degree I do not know. Also I do not know how many columns will be in this df at any iteration since it is part of pipeline of which I need to aggregate my output from.

I think in principle I need the functionality of combine_first but for columns. I tried something like this:

df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df[:]])

or

df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df['P1'],df['P2'],df['P3']])

But I have some issues figuring this out. Thanks for the help!

Use `bfill1 and `ffill`, it is usually faster than combine_first : ``df.replace({"NaN": np.nan}).bfill(axis=1).ffill(axis=1).iloc[:, 0]`` — sammywemmy, Aug 02 '21 at 08:31
Please feel to comment if you think the question is wrongly closed. — Ch3steR, Aug 02 '21 at 08:38

score 3 · Accepted Answer · answered Aug 02 '21 at 08:31

3

Use bfill on axis=1:

df['C1'] = df.replace('NaN', np.nan).bfill(axis=1)['P1']

>>> df

     P1    P2       P3       C1
0  ARF5   NaN      NaN     ARF5
1   NaN  M6PR      NaN     M6PR
2   NaN   NaN  NDUFAF7  NDUFAF7

answered Aug 02 '21 at 08:31

Corralien

109,409
8
28
52

Collapse overlapping coloums in pandas dataframe

1 Answers1