0

I have a dataset that is 30k in size. I have a column titled "Native Country" I want to create a new variable for every unique value in that column (the Algorithm I am using can only handle numeric value so I need to convert text to binary form).

When I use the following:

Native Country = pd.get_dummies(dataset.Native Country , prefix='Native Country' )
Native Country.head()

I get the following error message

SyntaxError: invalid syntax

Any suggestions please.

Ilya V. Schurov
  • 7,687
  • 2
  • 40
  • 78
Jim
  • 405
  • 1
  • 4
  • 6

1 Answers1

0

Python identifiers cannot have whitespaces. So you have to use underscore instead of whitespace in variable names. You also have to access column with […] instead of . if column name has a whitespace.

In [1]: import pandas as pd

In [2]: dataset = pd.DataFrame({'Native Country': ['a', 'b', 'a']})

In [6]: native_country = pd.get_dummies(dataset['Native Country'], prefix='Native Country'
   ...: )

In [7]: native_country.head()
Out[7]:
   Native Country_a  Native Country_b
0                 1                 0
1                 0                 1
2                 1                 0
Ilya V. Schurov
  • 7,687
  • 2
  • 40
  • 78
  • Thank you. Great answer and thanks for the explaination – Jim Mar 18 '17 at 21:52
  • @Jim, you are welcome :) You can 'accept' answer by clicking on checkmark to the left. This gives you and me some reputation points. :) – Ilya V. Schurov Mar 18 '17 at 22:10
  • @Jim, please, also, note that it is better to format your code with four whitespace indent (can be done easily with `Alt+K`). I edited your question in this way and now it looks a bit better. – Ilya V. Schurov Mar 18 '17 at 22:12