Questions tagged [one-hot-encoding]

One-Hot Encoding is a method to encode categorical variables to numerical data that Machine Learning algorithms can deal with. One-Hot encoding is most used during feature engineering for a ML Model. It converts categorical values into a new categorical column and assign a binary value of 1 or 0 to those columns.

Also known as Dummy Encoding, One-Hot Encoding is a method to encode categorical variables, where no such ordinal relationship exists, to numerical data that Machine Learning algorithms can deal with. One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of unique values. One hot encoding creates new, binary columns, indicating the presence of each possible value from the original data. These columns store ones and zeros for each row, indicating the categorical value of that row.

1224 questions
4
votes
2 answers

One Hot Encoding Multiple Categorical Data in a Column

Beginner here. I want to use one hot encoding on my data frame that has multiple categorical data in one column. My data frame looks something like this, although with more things in the column such that I can't just do it manually: Title …
razortight
  • 43
  • 4
4
votes
1 answer

Tensorflow One Hot Encoding - Could not find valid device for node

During my feature engingeering the following error occurred. My featurelist has 21 sublists with each 8537 values being either 0 or 1. When trying to run the One Hot Encoding via tensorflow it shows the error Could not find valid device for…
hux0
  • 207
  • 1
  • 4
  • 17
4
votes
2 answers

How to leave numerical columns out when using sklearn OneHotEncoder?

Environment: import pandas as pd from sklearn.pipeline import Pipeline from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.ensemble import RandomForestClassifier Sample data: X_train = pd.DataFrame({'A': ['a1', 'a3', 'a2'],…
4
votes
1 answer

How to keep track of columns after encoding categorical variables?

I am wondering how I can keep track of the original columns of a dataset once I perform data preprocessing on it? In the below code df_columns would tell me that column 0 in df_array is A, column 1 is B and so forth... However when once I encode…
4
votes
2 answers

save and load one hot encoding for ML

I have been searching for two days now and at seems I cannot grasp the solution. For a machine learning regression model, I need a hot encoding of some columns. The training data and model fitting is happening on my local PC. After this the model…
DimiDev
  • 73
  • 1
  • 7
4
votes
1 answer

how to resolve memory error caused by Get_dummies

I am using Python and I have dataset that has around 1 million records and around 50 column some of these columns has different types (such as IssueCode column can have 7000 different codes, another column SolutionCode can have 1000 codes) I am…
asmgx
  • 7,328
  • 15
  • 82
  • 143
4
votes
5 answers

How can I align pandas get_dummies across training / validation / testing?

I have 3 sets of data (training, validation and testing) and when I run: training_x = pd.get_dummies(training_x, columns=['a', 'b', 'c']) It gives me a certain number of features. But then when I run it across validation data, it gives me a…
Shamoon
  • 41,293
  • 91
  • 306
  • 570
4
votes
3 answers

One Hot Encoding a single column

I am trying to use one hot encoder on the target column('Species') in the Iris dataset. But I am getting the following errors: ValueError: Expected 2D array, got 1D array instead: Reshape your data either using array.reshape(-1, 1) if your data has…
ARC
  • 87
  • 1
  • 2
  • 9
4
votes
2 answers

Transform table to one-hot encoding for many rows

I have a SQL table of the following format: ID Cat 1 A 1 B 1 D 1 F 2 B 2 C 2 D 3 A 3 F Now, I want to create a table with one ID per row, and multiple Cat's in a row. My desired output looks as follows: ID A B C D E F 1 …
Emil
  • 1,531
  • 3
  • 22
  • 47
4
votes
1 answer

How to One-Hot encode multiple columns at once in dataFrame using Keras.to_Categorical?

I want to one-hot encode multiple columns in my data frame at once using Keras to_categorical. How to do it? need_to_encode = ['Item_Fat_Content', 'Outlet_Location_Type', 'Outlet_Type', 'Outlet_Size', 'Item_Type_Combined', 'Outlet'] These are the…
4
votes
2 answers

Preserve column order while one-hot encoding using pandas.get_dummies

What is the best/most Pythonic way to one-hot encode categorical features in a Pandas data frame while preserving the original order of the columns from which the categories (new column names) are extracted? For example, if I have three columns in…
strangeloop
  • 751
  • 1
  • 9
  • 15
4
votes
4 answers

One-hot-encoding multiple columns in sklearn and naming columns

I have the following code to one-hot-encode 2 columns I have. # encode city labels using one-hot encoding scheme city_ohe = OneHotEncoder(categories='auto') city_feature_arr = city_ohe.fit_transform(df[['city']]).toarray() city_feature_labels =…
4
votes
1 answer

Selecting columns from 3D tensor according to a 1D tensor of indices (Tensorflow)

I'm looking for a way in tensorflow to, given two inputs: input1, a 3D tensor of shape (batch_size, x, y) input2, a 1D tensor of shape (batch_size,) whose values are all in the range [0, y - 1] (inclusive). return a 2D tensor of shape…
4
votes
4 answers

Populate values for categorical data in their respective one-hot encoded columns

I have an csv file which have 100s of columns and rows. There two columns are my interest and based on that I need to create new columns in that csv file. Example: I have interested columns as below, input.csv count description 1 Good …
sundarr
  • 385
  • 2
  • 8
4
votes
1 answer

Pytorch LSTM: Target Dimension in Calculating Cross Entropy Loss

I've been trying to get an LSTM (LSTM followed by a linear layer in a custom model), working in Pytorch, but was getting the following error when calculating the loss: Assertion cur_target >= 0 && cur_target < n_classes' failed. I defined the loss…
LunarLlama
  • 229
  • 2
  • 11