I'm looking for a way to replicate the encode behaviour in Stata, which will convert a categorical string column into a number column.
x = pd.DataFrame({'cat':['A','A','B'], 'val':[10,20,30]})
x = x.set_index('cat')
Which results in:
val
cat
A 10
A 20
B 30
I'd like to convert the cat column from strings to integers, mapping each unique string to an (arbitrary) integer 1-to-1. It would result in:
val
cat
1 10
1 20
2 30
Or, just as good:
cat val
0 1 10
1 1 20
2 2 30
Any suggestions?
Many thanks as always, Rob