0

I was in the process of working through this tutorial: http://ahmedbesbes.com/how-to-score-08134-in-titanic-kaggle-challenge.html

And it went with no problems, until I got to the last section of the middle section:

As you can see, the features range in different intervals. Let's normalize all of them in the unit interval. All of them except the PassengerId that we'll need for the submission

In [48]:
>>> def scale_all_features():

>>>     global combined

>>>     features = list(combined.columns)
>>>     features.remove('PassengerId')
>>>     combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)

>>>     print 'Features scaled successfully !'

In [49]:
>>> scale_all_features()

Features scaled successfully !

and despite typing it word for word in my python script:

#Cell 48
GreatDivide.split()
def scale_all_features():

    global combined

    features = list(combined.columns)
    features.remove('PassengerId')
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)

    print 'Features scaled successfully !'

#Cell 49
GreatDivide.split()
scale_all_features()

It keeps giving me an error:

--------------------------------------------------48--------------------------------------------------
--------------------------------------------------49--------------------------------------------------
Traceback (most recent call last):
  File "KaggleTitanic[2-FE]--[01].py", line 350, in <module>
    scale_all_features()
  File "KaggleTitanic[2-FE]--[01].py", line 332, in scale_all_features
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4061, in apply
    return self._apply_standard(f, axis, reduce=reduce)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4157, in _apply_standard
    results[i] = func(v)
  File "KaggleTitanic[2-FE]--[01].py", line 332, in <lambda>
    combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 651, in wrapper
    return left._constructor(wrap_results(na_op(lvalues, rvalues)),
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 592, in na_op
    result[mask] = op(x[mask], y)
TypeError: ("unsupported operand type(s) for /: 'str' and 'str'", u'occurred at index Ticket')

What's the problem here? All of the previous 49 sections ran with no problem, so if I was getting an error it would have shown by now, right?

cchamberlain
  • 17,444
  • 7
  • 59
  • 72
Rich
  • 1,103
  • 1
  • 15
  • 36

1 Answers1

1

You can help insure that the math transformation only occurs on numeric columns with the following.

numeric_cols = combined.columns[combined.dtypes != 'object']
combined.loc[:, numeric_cols] = combined[numeric_cols] / combined[numeric_cols].max()

There is no need for that apply function.

Ted Petrou
  • 59,042
  • 19
  • 131
  • 136