how to drop a pandas multi level dataframe column when all sub columns are completely blank

Question

dataframe:

|--------------------------------------------------------------------|
|    Name        |    email          |  Phone no    |   Gender       |
|----------------|-------------------|--------------|----------------|
|legacy | target |legacy    | target |legacy|target |legacy | target |
|-------|--------|----------|--------|------|-------|-------|--------|
|Name1  |Name1   |n1@abc.com|        |      |       |       |        |
|Name2  |Name2   |          |        |      |   12  |       |        |
|--------------------------------------------------------------------|

Expected output:

|---------------------------------------------------|
|    Name        |    email          |  Phone no    |
|----------------|-------------------|--------------|
|legacy | target |legacy    | target |legacy|target |
|-------|--------|----------|--------|------|-------|
|Name1  |Name1   |n1@abc.com|        |      |       |
|Name2  |Name2   |          |        |      |   12  |
|---------------------------------------------------|

I am using the below code, but it is removing "email target" and "phone no legacy" column as well.

df.dropna(how='all', axis=1, inplace=True)

However I want to drop only the "Gender" column as this is the only column where both legacy and target fields are completely blank.

Could anyone please help me.

Thank you.

Anurag Dabas · Accepted Answer · 2021-06-16T06:45:14.270

1

try (I'm supposing empty cells are NaN):

m=df.isna().all().unstack(level=1)
cols=m[m.all(1)].index.tolist()

Finally use get_level_values():

df=df.loc[:, ~df.columns.get_level_values(0).isin(cols)]

output of df:

         Name             Email             Phone no
    Legecy  Target  Legecy      Target  Legecy  Target
0   Name1   Name1   n1@abc.com  NaN     NaN     NaN
1   Name1   Name2   NaN         NaN     NaN     12

edited Jun 16 '21 at 06:45

answered Jun 15 '21 at 17:30

Anurag Dabas

23,866
9
21
41

Thank you Anurag, Now it's removing column regardless of column blank values. When I tried with df=df.loc[:, df.columns.get_level_values(0)!='email'] , it's removing email as well. – Sri Jun 15 '21 at 18:25
@Sri Hi! you want to remove column name Gender so it is removing that one – Anurag Dabas Jun 15 '21 at 18:26
@Sri because you are passing 'email' in `df.loc[:, df.columns.get_level_values(0)!='email'] ` so it's removing email – Anurag Dabas Jun 15 '21 at 18:31
I want to remove the column only when both legacy and target column values of all the rows are completely blank. Thanks for your help!. – Sri Jun 15 '21 at 18:32
1

Thank you very much Anurag! It works fine perfectly now! It resolved my issue. Thanks again!!! – Sri Jun 16 '21 at 12:11

score 1 · Answer 2 · answered Jun 15 '21 at 18:40

1

Try (I'm supposing empty cells are strings ""):

m = (~df.eq("").all().groupby(level=0).all()).eq(True)
x = df.loc[:, m.index[m]]
print(x)

Prints:

    Name        Phone no          email       
  legacy target   legacy target  legacy target
0  Name1  Name1                  n1@abc       
1  Name2  Name2              12

answered Jun 15 '21 at 18:40

Andrej Kesely

168,389
15
48
91

Thank you Andrej!. I am getting an error "IndexError: too many indices for array" on the line "x = df.loc[:, m.index[m]]" – Sri Jun 15 '21 at 20:35

how to drop a pandas multi level dataframe column when all sub columns are completely blank

2 Answers2