9
print(np.shape(ar_fulldata_input_xx))

Output: (9027, 1443)

Now I use Imputer to impute the missing values of my dataframe ar_fulldata_input_xx as follows.

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=0)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(ar_fulldata_input_xx))

Now I check the size of my imputed dataframe as follows.

print(np.shape(imputed_DF))

Output: (9027, 1442)

Why is the column size reduced by one?

Is there any way I can find which column is mixing after impute function??

I have run the following line of code to remove the all columns with entire "NAN" values or entire "0" values.

ar_fulldata_input_xx = ar_fulldata_input_xx.loc[:, (ar_fulldata_input_xx != 0).any(axis=0)]

and

ar_fulldata_input_xx=ar_fulldata_input_xx.dropna(axis=1, how='all')
Stupid420
  • 1,347
  • 3
  • 19
  • 44
  • I thought it could have something to do with pandas index, but I tried to replicate it here and it works fine. – joaoavf Feb 19 '18 at 02:45
  • Is this a public dataset? Or is there any way you could share it? – joaoavf Feb 19 '18 at 02:45
  • @joaoavf.. No, It's not. But I can give it if that would be helpful. – Stupid420 Feb 19 '18 at 02:46
  • Hm, I am pretty sure pandas has something like the sklearn Imputer, would that work for you? Or for some reason you specifically need to use the Imputer? (in the second case it would help a lot to have the dataset) – joaoavf Feb 19 '18 at 02:49
  • @joaoavf. I just want to impute the missing values in columns with the mean values of that column. – Stupid420 Feb 19 '18 at 02:50
  • @joaoavf...How can I give the dataset. It's pretty heavy. – Stupid420 Feb 19 '18 at 02:52
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165385/discussion-between-abdul-karim-khan-and-joaoavf). – Stupid420 Feb 19 '18 at 02:53
  • 1
    You may be interested in the `interpolate` function instead: `pd.DataFrame(ar_fulldata_input_xx).interpolate()` – cs95 Feb 19 '18 at 03:08
  • @cᴏʟᴅsᴘᴇᴇᴅ..It will `interpolate` with the mean of each column? – Stupid420 Feb 19 '18 at 03:10

1 Answers1

4

You can do it on pandas using this:

ndf = df.fillna(df.mean())

It seems that there was an issue with one of the columns that was not importing properly the numeric values from the original file, so it is likely that this was the reason that the Imputer didn't work. OP is taking a look at it.

joaoavf
  • 1,343
  • 1
  • 12
  • 25