Questions tagged [label-encoding]

Label Encoding refers to converting categorical labels in a data set used for machine learning purposes, into numeric form. Machine learning algorithms can then decide in a better way on how those labels must be operated. It is an important pre-processing step for a structured data set in supervised learning.

119 questions
2
votes
0 answers

le.transform() ValueError: y contains previously unseen labels: [1, 2, 3, 4]

I'm running a very basic code to create encoder classes, and then use the same classes to encode a new dataframe. In this code, I don't need to use np.save and np.load, however in my actual implementation, I will need to re-load the encoder to…
Jo Bennet
  • 131
  • 6
2
votes
0 answers

Spark: What is the best way to do label encoding on a feature of variable length?

For Spark, there is a StringIndexer in Spark ML that can do label encoding for a given column. However it cannot directly handle the situation where the column is variable length feature (or multi-value feature). For example,…
CyberPlayerOne
  • 3,078
  • 5
  • 30
  • 51
1
vote
1 answer

i can't apply labelencoder to array of bool

I am on a machine learning project. I did import all libraries. I took one column of data(this column is array of bool) and i want to apply it labelencoder. Here is my whole code. data = pd.read_csv('odev_tenis.csv') le =…
1
vote
2 answers

how do i filter columns with data_type= object

encoder=LabelEncoder() categorical_features=df.columns.tolist() for col in categorical_features: df[col]=encoder.fit_transform(df[col]) df.head(20) **i want categorical_features to take columns with datatype=object
1
vote
1 answer

Why the index of Label Encoding is not seriated?

This is my label value: df['Label'].value_counts() ------------------------------------ Benign 4401366 DDoS attacks-LOIC-HTTP 576191 FTP-BruteForce 193360 SSH-Bruteforce 187589 DoS attacks-GoldenEye …
Dead
  • 11
  • 3
1
vote
0 answers

How to encode the new df values with existing LabelEncoder

I am quite new to ML can anyone please help me, I am facing issue while encoding and decoding below mentioned DF using preprocessing.LabelEncoder() df.head() Col1 | Col2 | Col3 | Col4 | Col5 | Col6 0 | Minor | Yes | …
1
vote
1 answer

LabelEncoding in Pandas on a column with list of strings across rows

I would like to LabelEncode a column in pandas where each row contains a list of strings. Since a similar string/text carries a same meaning across rows, encoding should respect that, and ideally encode it with a unique number. Imagine: import…
TwinPenguins
  • 475
  • 9
  • 17
1
vote
1 answer

Alternatives of LabelEncoder() for target variable while implementing in a pipeline

I am developing a classification base model. I have used the concept of ColumnTransformer and Pipeline for feature engineering and selection, model selection, and for everything. I wanted to encode my categorical target (dependent) variable to…
1
vote
1 answer

Label encoding by value counts

I try to do label encoding for my cities. However, I want it to label according to which city is more than others. Let's say; Oslo has 500 rows Berlin has 400 rows Napoli has 300 rows in the dataset So label encoding will label those cities…
efc07
  • 33
  • 3
1
vote
1 answer

Label Encoder and Inverse_Transform on SOME Columns

Suppose I have a dataframe like the following df = pd.DataFrame({'animal': ['Dog', 'Bird', 'Dog', 'Cat'], 'color': ['Black', 'Blue', 'Brown', 'Black'], 'age': [1, 10, 3, 6], …
1
vote
1 answer

Feature selection and categorical variables

I work on a dataset which contain mainly binary variables. However two of the are categorical with multiple values (strings). I want to apply feature selection using lasso but i have an error Keyerror: could not convert string to float: Should i use…
1
vote
0 answers

Dask-ml LabelEncoder.fit_tranform() threw AttributeError: 'bool' object has no attribute 'astype'

So I tried to apply LabelEncoder() function to columns that have object dtype on my Dask dataframe: le = dm.LabelEncoder() #dm is dask-ml module for column in df.columns: if df[column].dtype == type(object): df[column]…
1
vote
2 answers

Mapping categorical data from user input to its actual encoded value for prediction

A portion of my dataset looks like this (there are many other processor types in my actual data) df.head(4) Processor Task Difficulty Time i3 34 3 6 i7 34 3 4 i3 50 1 6 i5 25 2…
1
vote
2 answers

Label Encoding using weights for string nominal variables for random forest classification

I have NYC 311 complaint dataset. I want to build a random forest classifier which will take categorical input features about a complaint and will determine the complaint type. Following are the input feature of a given complaint record X =…
1
vote
0 answers

return array(a, dtype, copy=False, order=order) ValueError: could not convert string to float: 'STRING' when building machine leaning model

I'm getting the following error: return array(a, dtype, copy=False, order=order) ValueError: could not convert string to float: 'BOX72'(BOX72 is a value under column5). The error seems to come at the line with code impute_knn.fit_transform(X) Here…