How should one find all the columns in SFrame that has at least one None value in it? One way to do this would be to iterate through every column and check if any value in that column is None or not. Is there a better way to do the job?
Asked
Active
Viewed 380 times
3 Answers
4
To find None
values in an SFrame
use the SArray
method num_missing
(doc).
Solution
>>> col_w_none = [col for col in sf.column_names() if sf[col].num_missing()>0]
Example
>>> sf = gl.SFrame({'foo':[1,2,3,4], 'bar':[1,None,3,4]})
>>> print sf
+------+-----+
| bar | foo |
+------+-----+
| 1 | 1 |
| None | 2 |
| 3 | 3 |
| 4 | 4 |
+------+-----+
[4 rows x 2 columns]
>>> print [col for col in sf.column_names() if sf[col].num_missing()>0]
['bar']
Caveats
- It isn't optimal since it won't stop to iterate at the first
None
value. - It won't detect
NaN
and empty string.
>>> sf = gl.SFrame({'foo':[1,2,3,4], 'bar':[1,None,3,4], 'baz':[1,2,float('nan'),4], 'qux':['spam', '', 'ham', 'eggs']} )
>>> print sf
+------+-----+-----+------+
| bar | baz | foo | qux |
+------+-----+-----+------+
| 1 | 1.0 | 1 | spam |
| None | 2.0 | 2 | |
| 3 | nan | 3 | ham |
| 4 | 4.0 | 4 | eggs |
+------+-----+-----+------+
[4 rows x 4 columns]
>>> print [col for col in sf.column_names() if sf[col].num_missing()>0]
['bar']

Adrien Renaud
- 2,439
- 18
- 22
0
Here is a Pandas solution:
In [50]: df
Out[50]:
keys values
0 1 1.0
1 2 2.0
2 2 3.0
3 3 4.0
4 3 5.0
5 3 NaN
6 3 7.0
In [51]: df.columns.to_series()[df.isnull().any()]
Out[51]:
values values
dtype: object
In [52]: df.columns.to_series()[df.isnull().any()].tolist()
Out[52]: ['values']
Explanation:
In [53]: df.isnull().any()
Out[53]:
keys False
values True
dtype: bool

MaxU - stand with Ukraine
- 205,989
- 36
- 386
- 419
0
You can use isnull:
pd.isnull(df).sum() > 0
Example:
df = pd.DataFrame({'col1':['A', 'A', 'B','B'], 'col2': ['B','B','C','C'], 'col3': ['C','C','A','A'], 'col4': [11,12,13,np.nan], 'col5': [30,10,14,91]})
df
col1 col2 col3 col4 col5
0 A B C 11.0 30
1 A B C 12.0 10
2 B C A 13.0 14
3 B C A NaN 91
pd.isnull(df).sum() > 0
col1 False
col2 False
col3 False
col4 True
col5 False
dtype: bool

Joe T. Boka
- 6,554
- 6
- 29
- 48