1

dataframe:

|--------------------------------------------------------------------|
|    Name        |    email          |  Phone no    |   Gender       |
|----------------|-------------------|--------------|----------------|
|legacy | target |legacy    | target |legacy|target |legacy | target |
|-------|--------|----------|--------|------|-------|-------|--------|
|Name1  |Name1   |n1@abc.com|        |      |       |       |        |
|Name2  |Name2   |          |        |      |   12  |       |        |
|--------------------------------------------------------------------|

Expected output:

|---------------------------------------------------|
|    Name        |    email          |  Phone no    |
|----------------|-------------------|--------------|
|legacy | target |legacy    | target |legacy|target |
|-------|--------|----------|--------|------|-------|
|Name1  |Name1   |n1@abc.com|        |      |       |
|Name2  |Name2   |          |        |      |   12  |
|---------------------------------------------------|

I am using the below code, but it is removing "email target" and "phone no legacy" column as well.

df.dropna(how='all', axis=1, inplace=True)

However I want to drop only the "Gender" column as this is the only column where both legacy and target fields are completely blank.

Could anyone please help me.

Thank you.

Sri
  • 85
  • 4

2 Answers2

1

try (I'm supposing empty cells are NaN):

m=df.isna().all().unstack(level=1)
cols=m[m.all(1)].index.tolist()

Finally use get_level_values():

df=df.loc[:, ~df.columns.get_level_values(0).isin(cols)] 

output of df:

         Name             Email             Phone no
    Legecy  Target  Legecy      Target  Legecy  Target
0   Name1   Name1   n1@abc.com  NaN     NaN     NaN
1   Name1   Name2   NaN         NaN     NaN     12
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • Thank you Anurag, Now it's removing column regardless of column blank values. When I tried with df=df.loc[:, df.columns.get_level_values(0)!='email'] , it's removing email as well. – Sri Jun 15 '21 at 18:25
  • @Sri Hi! you want to remove column name Gender so it is removing that one – Anurag Dabas Jun 15 '21 at 18:26
  • @Sri because you are passing 'email' in `df.loc[:, df.columns.get_level_values(0)!='email'] ` so it's removing email – Anurag Dabas Jun 15 '21 at 18:31
  • I want to remove the column only when both legacy and target column values of all the rows are completely blank. Thanks for your help!. – Sri Jun 15 '21 at 18:32
  • 1
    Thank you very much Anurag! It works fine perfectly now! It resolved my issue. Thanks again!!! – Sri Jun 16 '21 at 12:11
1

Try (I'm supposing empty cells are strings ""):

m = (~df.eq("").all().groupby(level=0).all()).eq(True)
x = df.loc[:, m.index[m]]
print(x)

Prints:

    Name        Phone no          email       
  legacy target   legacy target  legacy target
0  Name1  Name1                  n1@abc       
1  Name2  Name2              12               
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thank you Andrej!. I am getting an error "IndexError: too many indices for array" on the line "x = df.loc[:, m.index[m]]" – Sri Jun 15 '21 at 20:35