Pandas: Best way to remove NaN from multiple columns and convert them to int

Question

Suppose I have below CSV data:

col1,col2,col3,label
,1,2,label1
3,,4,label2
5,6,7,label3

What is the best way to read this data and convert col1 & col2 which would be float to int.

I am able to use this and convert my filtered dataframe which only has the numeric columns (col1,col2,col3). How can I modify the main dataframe itself ignoring the label column which is string?

On a related note, I could also use below command. Any idea how I could run it in a loop so that variable name col%d is dynamically generated, since I have 32 columns.

filter_df.col1 = filter_df.col1.fillna(0).astype(int)

To iterate over the 32 columns you could use this approach getattr builtin https://docs.python.org/3/library/functions.html#getattr — Tomasz Sabała, Nov 09 '18 at 06:52

score 5 · Accepted Answer · answered Nov 09 '18 at 06:47

5

Use select_dtypes with np.number:

print (filter_df)
   col1  col2  col3   label
0   NaN   1.0     2     NaN
1   3.0   NaN     4  label2
2   5.0   6.0     7  label3

cols = filter_df.select_dtypes(np.number).columns
filter_df[cols] = filter_df[cols].fillna(0).astype(int)

print (filter_df)
   col1  col2  col3   label
0     0     1     2     NaN
1     3     0     4  label2
2     5     6     7  label3

answered Nov 09 '18 at 06:47

jezrael

822,522
95
1,334
1,252

Note to readers: `select_dtypes` will return a new DataFrame and select the columns from there. – cs95 Nov 09 '18 at 07:03
@coldspeed - sure, what is problem with it? – jezrael Nov 09 '18 at 07:04
No problem, just mentioned to readers who feel memory efficiency is important :) – cs95 Nov 09 '18 at 07:05

score 4 · Answer 2 · answered Nov 09 '18 at 06:51

4

You can use fillna with downcast='infer'.

m = df.dtypes == np.number
df.loc[:, m] = df.loc[:, m].fillna(0, downcast='infer')
print(df)
   col1  col2  col3   label
0     0     1     2     NaN
1     3     0     4  label2
2     5     6     7  label3

answered Nov 09 '18 at 06:51

cs95

379,657
97
704
746

Pandas: Best way to remove NaN from multiple columns and convert them to int

2 Answers2