Can't convert float to int in python DataFrame/Array

Question

I'm new to both Kaggle and Python and can't figure out how to convert this data set. For anyone familiar, I'm trying to reproduce the gender based solution for the Titanic tutorial.

I have:

submission = pd.DataFrame({'PassengerId' : test_data.PassengerId, 'Survived' : final_prediction})
print(submission.head())

Which gives me:

PassengerId Survived 0 892 0.184130 1 893 0.761143 2 894 0.184130 3 895 0.184130 4 896 0.761143

Which I need to convert to:

PassengerId Survived 0 892 0 1 893 1 2 894 0 3 895 0 4 896 1

Again, not really knowing Python, I have tried some solutions like:

for x in np.nditer(final_prediction, op_flags=['readwrite']):
    x[...]=(1 if x[...] >= 0.50 else 0)

Which gives me floating point like: (and still shows in CSV file as 0.0, 1.0)

PassengerId Survived 0 892 0. 1 893 1.

And:

rounded_prediction = np.rint(final_prediction)

Gives me the same (i.e. 0., 1.)

The following:

int_prediction = final_prediction.astype(int)

Gives me all 0's

Any ideas? Thanks!

score 0 · Accepted Answer · answered May 08 '18 at 13:53

So first of all, try and keep in mind that you want to use as many vectorized operations as possible because this will speed up your code! Always important. So instead of looping through, pandas has an amazing way of doing this.

submission['Survived'] = submission['Survived'].astype(int)

Do note that this will truncate values so in your case you might want to say:

submission['Survived][:] += 0.5 before performing the above which will ensure values of 0.5 to be 1 when you convert to int and values below that to truncate to 0.

Changing of the dtype (types of columns can be found with df.dtypes) is thus done with the function pd.astype()

Might be another way of stating literally that it should be rounded up/down but with this simple data manipulation it should work ;)

My final attempt above was the closest. I needed to add the 0.5 to each element in the vector. And thanks for your notes about keeping things vectorized! `int_prediction = (final_prediction + 0.5).astype(int)` — Frankie, May 08 '18 at 17:56

score 0 · Answer 2 · answered May 08 '18 at 13:53

0

You need to apply round and then convert the result to 'int' to drop the decimal point. This should work: np.rint(final_prediction).astype(np.int)

answered May 08 '18 at 13:53

Aparna Mahadevan

1
1

Can't convert float to int in python DataFrame/Array

2 Answers2