Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work
Questions tagged [feature-engineering]
481 questions
1
vote
1 answer
Development of a feature per row or from today's date
I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order…

Test
- 571
- 13
- 32
1
vote
0 answers
Dataset with similar data points, but some of the targets are different
I have a df:
gender category subcategory item_brand item_NWT item_price
Women Outerwear Jacket J. Crew NWT 22.0
Women Outerwear Jacket Talbots NWT 50.0
Women Outerwear …

Alejandro
- 119
- 7
1
vote
1 answer
Use primitive_options on Featuretools to calc feature_matrix
I have a dataset with more than 30.000 rows like the picture below and need to generate some features with the featuretools library.
import pandas as pd
import featuretools as ft
# Read in the full dataset
df_data =…

Peter29
- 21
- 4
1
vote
1 answer
Create binary indicator dependent on previous row using Python and Pandas
I am coming from the following Excel table:
I want to create a binary indicator indicating cases where the departure airport is not equal the previous arrival airport - basically reconstructing what I did in Excel (Note: "WENN" is equal to "IF" in…

Paul1911
- 163
- 10
1
vote
0 answers
Feature importance without label for time-series data with large number of columns/features
I have a sample time-series dataset (23, 14291), which is a pivot table count for 24hrs count for some users; I'm trying to filter some of the columns/features which they don't have a time-series based nature and filter columns to reach meaningful…

Mario
- 1,631
- 2
- 21
- 51
1
vote
0 answers
Feature enginnering base on previous data for tennis ATP dataset
I have a dataset base on results of tennis matches.
I would like add a new feature which indicated the performance of a player based on its last 10 matches ( percentage of winning matches)
Here is an exctrat of my dataset
Date …

Gouzylla
- 21
- 2
1
vote
1 answer
Machine Learning feature-engine MeanEncoder gives error with Cancer dataset
I'm working with the wisconsin breast cancer dataset found here. Feature engineering is important in machine learning so a teacher of mine recommended the MeanEncoder part of a library found here. The dataframe looks like the following:
I did…

Omar Moodie
- 263
- 3
- 13
1
vote
1 answer
dplyr dynamically create lag and ma features
I am trying to create a process that takes in a dataframe and creates additional lagged and rolling window features (e.g. moving average). This is what I have so far.
# dummy dataframe
n <- 20
set.seed(123)
foo <- data.frame(
date =…

takmers
- 71
- 1
- 5
1
vote
0 answers
pandas.Series.unique() of a feature has 0 while pandas.Series of the same feature has no 0
I have a feature LotFrontage in dataset.
print(0 in dataset['LotFrontage']) #Prints True
print(0 in dataset['LotFrontage'].unique()) #Prints False
I think 0 should be present in the unique values of LotFrontage only if its present in LotFrontage.…

Ganesh Dagadi
- 74
- 9
1
vote
1 answer
tensorflow- how to keep column label name when one-hot-encoding?
When trying to get the label of a column in order to one hot encode it by using tensorflow:
import tensorflow as tf
import pandas as pd
import numpy as np
# some data
d={'column1':['a', 'b', 'c', 'd'], 'column2':['e', 'f', 'g', 'h'], 'column3':[1,…

AlSub
- 1,384
- 1
- 14
- 33
1
vote
1 answer
Features Created by FeatureTools Build Inconsistent Models
I have an imbalanced dataset which has 200 million data from class 0 and 8000 data from class 1. I followed two different approaches to build a model.
Randomly sample a new dataset which has a ratio of 1:4. Meaning 32000 from class 0 and 8000 from…

mcsahin
- 63
- 1
- 7
1
vote
0 answers
How to encode high cardinality feature?
I have a dataset with more 1500+ category in single feature. how to encode these feature ? I tried with target encoding but there is category mismatch in training and test dataset. For example in training dataset there is A,B,C category of feature X…

Tejas
- 33
- 5
1
vote
0 answers
Is there a way to use Android's Speech Recognition API to output true/false, even when the phone is off?
I'm trying to detect speech for a feature in a machine learning algorithm, but I'm having a hard time running the Speech Recognition API in the background and while the phone is off. I've tried to put the SpeechRecognizer in a service, however, the…

Aaron Walker
- 36
- 3
1
vote
1 answer
Strange output from category_encoders.TargetEncoder when there is only one row in the category (python)
I am trying to use category_encoders.TargetEncoder to encode a categorical feature. My target variable is a continuous number. However, the output from the target encoder is very strange and I could not interpret it. Could someone give me a hint on…

Yue Y
- 583
- 1
- 6
- 24
1
vote
1 answer
Aggregate features row-wise in dataframe
i am trying to create features from sample that looks like…

spynal
- 33
- 6