11

I would like to print all rows of a dataframe where I find the value '-' in any of the columns. Can someone please explain a way that is better than those described below?

This Q&A already explains how to do so by using boolean indexing but each column needs to be declared separately:

print df.ix[df['A'].isin(['-']) | df['B'].isin(['-']) | df['C'].isin(['-'])]

I tried the following but I get an error 'Cannot index with multidimensional key':

df.ix[df[df.columns.values].isin(['-'])]

So I used this code but I'm not happy with the separate printing for each column tested because it is harder to work with and can print the same row more than once:

import pandas as pd

d = {'A': [1,2,3], 'B': [4,'-',6], 'C': [7,8,'-']}
df = pd.DataFrame(d)

for i in range(len(d.keys())):  
    temp = df.ix[df.iloc[:,i].isin(['-'])]
    if temp.shape[0] > 0:
        print temp

Output looks like this:

   A  B  C
1  2  -  8

[1 rows x 3 columns]

   A  B  C
2  3  6  -

[1 rows x 3 columns]

Thanks for your advice.

Community
  • 1
  • 1
KieranPC
  • 8,525
  • 7
  • 22
  • 25

2 Answers2

8

Alternatively, you could do something like df[df.isin(["-"]).any(axis=1)], e.g.

>>> df = pd.DataFrame({'A': [1,2,3], 'B': ['-','-',6], 'C': [7,8,9]})
>>> df.isin(["-"]).any(axis=1)
0     True
1     True
2    False
dtype: bool
>>> df[df.isin(["-"]).any(axis=1)]
   A  B  C
0  1  -  7
1  2  -  8

(Note I changed the frame a bit so I wouldn't get the axes wrong.)

DSM
  • 342,061
  • 65
  • 592
  • 494
  • Thanks, I like this method and it worked on my data. Now I need to take time to understand exactly how it works. – KieranPC Jul 15 '14 at 11:39
5

you can do:

>>> idx = df.apply(lambda ts: any(ts == '-'), axis=1)
>>> df[idx]
   A  B  C
1  2  -  8
2  3  6  -

or

lambda ts: '-' in ts.values

note that in looks into the index not the values, so you need .values

behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
  • Good to know this, thanks. I'll share a solution I found that skipped the problem before it happened (but only works for my specific case). I was able to avoid the '-' value from entering the data by using pd.read_csv(..., na_values=["-"]) which converts this to NaN. – KieranPC Jul 15 '14 at 12:07