Python sklearn kaggle/titanic tutorial fails on the last feature scale

Question

I was in the process of working through this tutorial: http://ahmedbesbes.com/how-to-score-08134-in-titanic-kaggle-challenge.html

And it went with no problems, until I got to the last section of the middle section:

As you can see, the features range in different intervals. Let's normalize all of them in the unit interval. All of them except the PassengerId that we'll need for the submission

In [48]:
>>> def scale_all_features():

>>>     global combined

>>>     features = list(combined.columns)
>>>     features.remove('PassengerId')
>>>     combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)

>>>     print 'Features scaled successfully !'

In [49]:
>>> scale_all_features()

Features scaled successfully !

and despite typing it word for word in my python script:

#Cell 48
GreatDivide.split()
def scale_all_features():

    global combined

    features = list(combined.columns)
    features.remove('PassengerId')
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)

    print 'Features scaled successfully !'

#Cell 49
GreatDivide.split()
scale_all_features()

It keeps giving me an error:

--------------------------------------------------48--------------------------------------------------
--------------------------------------------------49--------------------------------------------------
Traceback (most recent call last):
  File "KaggleTitanic[2-FE]--[01].py", line 350, in <module>
    scale_all_features()
  File "KaggleTitanic[2-FE]--[01].py", line 332, in scale_all_features
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4061, in apply
    return self._apply_standard(f, axis, reduce=reduce)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4157, in _apply_standard
    results[i] = func(v)
  File "KaggleTitanic[2-FE]--[01].py", line 332, in <lambda>
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 651, in wrapper
    return left._constructor(wrap_results(na_op(lvalues, rvalues)),
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 592, in na_op
    result[mask] = op(x[mask], y)
TypeError: ("unsupported operand type(s) for /: 'str' and 'str'", u'occurred at index Ticket')

What's the problem here? All of the previous 49 sections ran with no problem, so if I was getting an error it would have shown by now, right?

The *Ticket* column contains strings. So, normalizing them wouldn't just happen. Try, one hot encoding them. — Nickil Maveli, Dec 15 '16 at 19:27

score 1 · Accepted Answer · answered Dec 15 '16 at 19:37

1

You can help insure that the math transformation only occurs on numeric columns with the following.

numeric_cols = combined.columns[combined.dtypes != 'object']
combined.loc[:, numeric_cols] = combined[numeric_cols] / combined[numeric_cols].max()

There is no need for that apply function.

answered Dec 15 '16 at 19:37

Ted Petrou

59,042
19
131
136

Got it. I'll see if this works and get back to you, thanks. – Rich Dec 15 '16 at 20:24
Hey, everything *seems* to be working, thanks! I'll let you know if I run into any problems. – Rich Dec 16 '16 at 01:31

Python sklearn kaggle/titanic tutorial fails on the last feature scale

1 Answers1