2

Given a dataframe like this:

'John', 0.25
'Mary', 0.2
'Adam', 0.1
'Andrew', 0.6

I would like to generate a unique integer for every category in a certain series. For example, in the case above, the output could be something like this

0, 0.25
1, 0.2
2, 0.1
3, 0.6

possibly with pandas or standard libraries only.

Bob
  • 849
  • 5
  • 14
  • 26

1 Answers1

1

I think you can use factorize like:

print df
          a     b
0    'John'  0.25
1    'Mary'  0.20
2    'Mary'  0.20
3    'Adam'  0.10
4    'Adam'  0.10
5    'Adam'  0.10
6  'Andrew'  0.60

print pd.factorize(df.a)
(array([0, 1, 1, 2, 2, 2, 3]), 
 Index([u''John'', u''Mary'', u''Adam'', u''Andrew''], dtype='object'))

df['a'] = pd.factorize(df.a)[0]
print df

   a     b
0  0  0.25
1  1  0.20
2  1  0.20
3  2  0.10
4  2  0.10
5  2  0.10
6  3  0.60
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252