2

I'm trying to find the equivalent of the sklearn LabelEncoder or the OrdinalEncoder in Azure ML Studio. I understand the Convert to Indicator Values module performs One-hot encoding but I can't find anything that would do label encoding.

What I have is a column with six unique string values and what I need is to represent that data with integers from 0 to 6.

Right now, I'm using the Execute Python Script module to do it but I was wondering if there's a built-in module to do it.

Judy T Raj
  • 1,755
  • 3
  • 27
  • 41
  • Small question - why do you want to represent the data as integers? Is this a hard requirement, or do you simply want to mark the feature as representing a category? – Vlad Iliescu May 12 '19 at 16:16

1 Answers1

1

There is Feature Hashing module that converts strings to integer encoded features using the Vowpal Wabbit library. It builds a dictionary and based on this dictionary converts its items into hash values. So instead of having a string column you will have your data in the following format:

Hashing feature 1   Hashing feature 2   Hashing feature 3
1                   0                   0
Alibek Jakupov
  • 620
  • 6
  • 14