0

I have a total of around 80 columns out of which some 20 columns are categorical which needs to be label encoded. I checked the solution provided here and the solution stated to work with Feature Hashing technique. But the feature hashing technique converts similar to One-Hot encoding and not label encoding.

Example:

Column1
RL
RL
RM
RL
RM
RM

After feature hashing the code turns out to be similar to one-hot encoding as:

Column1-RL     Column1-RM
1              0
1              0
0              1
1              0
0              1
0              1

How to do similar to label encoding in Azure-ml-studio to make the output as similar to:

Column1
1
1
2
1
2
2
Sunag
  • 105
  • 6

1 Answers1

0

In ML Studio we can perform predictions using three different features. We can perform using Notebooks which look like Jupyter notebook. The second pattern is using AutoML. Using this AutoML feature, we will get to implement the prediction model automatically with the pre-defined rules and finally designer. Designer is a tool which will take all the requirements in the form of a node and connect each node to another node based on input and output.

The label encoder is not available directly as an exclusive option in AutoML and Designer. This feature is embedded in Notebook with programming structure. In AutoML it will be performed by the model itself internally once we start running the model after uploading the dataset. The labels will be generated and visible in AutoML dataset output for validation.

import pandas as pd

df = pd.read_csv(“filename.csv”)

df.head() # to get top 5 rows of the dataset

enter image description here

df.dtypes #types of every variable

enter image description here

#we need to implement the label encoder of object variable.

df['target'].unique() # we will get the unique variables

enter image description here

df[“Class”].value_count() #get the count of each category

enter image description here

from sklearn.preprocessing import LabelEncoder

lable_encoder = LabelEncoder() #created the object for label encoder class

#Implement label encoder on the target variable. And save that to original dataframe

df['target] = label_encoder.fit_transform(df[“Class”]) #transformed and replaced with original dataframe

To check whether the dataset is updated or not.

df.dtypes #use this method to get the updated dataset data types for each column(feature).

enter image description here

df['target'].unique() # we will get number for each category in that column

enter image description here

To know the count of each category

df['target'].value_counts() # will get total amount of count for each category

enter image description here

To run this feature on Azure platform we need to create a resource and use the subscription key and use the above code in the notebook.

In the case of AutoML, run the model by uploading the dataset and the result will be scene like below after the modelling.

enter image description here

Sairam Tadepalli
  • 1,563
  • 1
  • 3
  • 11