Questions tagged [one-hot-encoding]

One-Hot Encoding is a method to encode categorical variables to numerical data that Machine Learning algorithms can deal with. One-Hot encoding is most used during feature engineering for a ML Model. It converts categorical values into a new categorical column and assign a binary value of 1 or 0 to those columns.

Also known as Dummy Encoding, One-Hot Encoding is a method to encode categorical variables, where no such ordinal relationship exists, to numerical data that Machine Learning algorithms can deal with. One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of unique values. One hot encoding creates new, binary columns, indicating the presence of each possible value from the original data. These columns store ones and zeros for each row, indicating the categorical value of that row.

1224 questions
4
votes
1 answer

one hot encoding of output labels

While I understand the need to one hot encode features in the input data, how does one hot encoding of output labels actually help? The tensor flow MNIST tutorial encourages one hot encoding of output labels. The first assignment in…
lazy python
  • 179
  • 1
  • 10
4
votes
1 answer

How do I use OneHotEncoder on a pandas series of lists?

I have a Pandas data frame which contains a series of lists. I would like to use SciKit-Learn's OneHotEncoder on this series. I keep getting a value error. My problem is reproduced as: import pandas as pd import numpy as np d = {'A': [[5,7], [3, 4,…
Michael
  • 87
  • 2
  • 9
4
votes
2 answers

Concatenate encoded columns to original data frame using Scikit-learn and Pandas

I am trying to encode all the textual data in a .csv file to numeric using Python's Scikit-learn. I am using LabelEncoder and OneHotEncoder on the columns which are of datatype object. I am wondering how to concatenate the new encoded columns with…
moirK
  • 651
  • 3
  • 11
  • 34
4
votes
1 answer

Save OneHot Encoder object python

Is there anyway of saving OneHotencoder object in python? . Reason is being I used that object in preprocessing of training data and test data and we are building a API containing the same trained model and that will be injected by real data from…
user3085459
  • 135
  • 1
  • 10
4
votes
3 answers

Scikit: Convert one-hot encoding to encoding with integers

I need to convert one-hot encoding to categories represented by unique integers. So one-hot encoding created with the following code: from sklearn.preprocessing import OneHotEncoder enc = OneHotEncoder() labels = [[1],[2],[3]] enc.fit(labels) for…
dokondr
  • 3,389
  • 12
  • 38
  • 62
4
votes
1 answer

H2o GLM interact only certain predictors

I'm interested in creating interaction terms in h2o.glm(). But I do not want to generate all pairwise interactions. For example, in the mtcars dataset...I want to interact 'mpg' with all the other factors such as 'cyl','hp', and 'disp' but I don't…
Raag Agrawal
  • 146
  • 1
  • 9
4
votes
3 answers

create dummies from a column for a subset of data, which does't contains all the category value in that column

I am handling a subset of the a large data set. There is a column named "type" in the dataframe. The "type" are expected to have values like [1,2,3,4]. In a certain subset, I find the "type" column only contains certain values like [1,4],like In…
jessie tio
  • 323
  • 2
  • 10
4
votes
3 answers

How can I one hot encode multiple variables with big data in R?

I currently have a dataframe with 260,000 rows and 50 columns where 3 columns are numeric and the rest are categorical. I wanted to one hot encode the categorical columns in order to perform PCA and use regression to predict the class. How can I go…
Nick
  • 81
  • 5
4
votes
2 answers

One hot encoding categorical features - Sparse form only

I have a dataframe that has int and categorical features. The categorical features are 2 types: numbers and strings. I was able to One hot encode columns that were int and categorical that were numbers. I get an error when I try to One hot encode…
Aman
  • 353
  • 1
  • 3
  • 13
4
votes
1 answer

ValueError: Can't handle mix of multilabel-indicator and binary

I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation. This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance,…
4
votes
1 answer

Mixed one_hot and float input

I am trying to train model of LSTM layers data of timeseries of categorical (one_hot) action(call/fold/raise) and time. So example time series of 3 rounds where player 2x called and then folded. #Call #0.5s # Call #0.3s #Fold,…
4
votes
2 answers

Python: One-hot encoding for huge data

I am keep getting memory issues trying to encode string labels to one-hot encoding. There are around 5 million rows and around 10000 different labels. I have tried the following but keep getting memory errors: from sklearn import preprocessing lb =…
Mpizos Dimitris
  • 4,819
  • 12
  • 58
  • 100
4
votes
1 answer

Getting feature names after one-hot encoding

I have a dataset that I've recently transformed through one-hot encoding and used it trained a lasso logistic regression on it. I'm trying to get a list of the non-zero coefficients. I can get a list of the coefficients through sklearn but I'm not…
yogz123
  • 703
  • 3
  • 8
  • 25
4
votes
3 answers

How to encode categorical features in sklearn?

I have a dataset with 41 features [from 0 to 40 columns], of which 7 are categorical. This categorical set is divided in two subset: A subset of string type(the column-features 1, 2, 3) A subset of int type, in binary form 0 or 1 (the…
4
votes
1 answer

Tensorflow embedding lookup using onehot encoding

I currently have onehot encoddings that I want to use embeddings for. However when I call embed=tf.nn.embedding_lookup(embeddings, train_data) print(embed.get_shape()) embed data shape (11, 32, 729, 128) This shape should be (11, 32, 128) but…
Rik
  • 1,870
  • 3
  • 22
  • 35