Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
1
vote
1 answer

Development of a feature per row or from today's date

I have a problem. I want to predict when the customer will place another order in how many days if an order comes in. I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order…
1
vote
0 answers

Dataset with similar data points, but some of the targets are different

I have a df: gender category subcategory item_brand item_NWT item_price Women Outerwear Jacket J. Crew NWT 22.0 Women Outerwear Jacket Talbots NWT 50.0 Women Outerwear …
1
vote
1 answer

Use primitive_options on Featuretools to calc feature_matrix

I have a dataset with more than 30.000 rows like the picture below and need to generate some features with the featuretools library. import pandas as pd import featuretools as ft # Read in the full dataset df_data =…
Peter29
  • 21
  • 4
1
vote
1 answer

Create binary indicator dependent on previous row using Python and Pandas

I am coming from the following Excel table: I want to create a binary indicator indicating cases where the departure airport is not equal the previous arrival airport - basically reconstructing what I did in Excel (Note: "WENN" is equal to "IF" in…
1
vote
0 answers

Feature importance without label for time-series data with large number of columns/features

I have a sample time-series dataset (23, 14291), which is a pivot table count for 24hrs count for some users; I'm trying to filter some of the columns/features which they don't have a time-series based nature and filter columns to reach meaningful…
Mario
  • 1,631
  • 2
  • 21
  • 51
1
vote
0 answers

Feature enginnering base on previous data for tennis ATP dataset

I have a dataset base on results of tennis matches. I would like add a new feature which indicated the performance of a player based on its last 10 matches ( percentage of winning matches) Here is an exctrat of my dataset Date …
Gouzylla
  • 21
  • 2
1
vote
1 answer

Machine Learning feature-engine MeanEncoder gives error with Cancer dataset

I'm working with the wisconsin breast cancer dataset found here. Feature engineering is important in machine learning so a teacher of mine recommended the MeanEncoder part of a library found here. The dataframe looks like the following: I did…
1
vote
1 answer

dplyr dynamically create lag and ma features

I am trying to create a process that takes in a dataframe and creates additional lagged and rolling window features (e.g. moving average). This is what I have so far. # dummy dataframe n <- 20 set.seed(123) foo <- data.frame( date =…
takmers
  • 71
  • 1
  • 5
1
vote
0 answers

pandas.Series.unique() of a feature has 0 while pandas.Series of the same feature has no 0

I have a feature LotFrontage in dataset. print(0 in dataset['LotFrontage']) #Prints True print(0 in dataset['LotFrontage'].unique()) #Prints False I think 0 should be present in the unique values of LotFrontage only if its present in LotFrontage.…
1
vote
1 answer

tensorflow- how to keep column label name when one-hot-encoding?

When trying to get the label of a column in order to one hot encode it by using tensorflow: import tensorflow as tf import pandas as pd import numpy as np # some data d={'column1':['a', 'b', 'c', 'd'], 'column2':['e', 'f', 'g', 'h'], 'column3':[1,…
AlSub
  • 1,384
  • 1
  • 14
  • 33
1
vote
1 answer

Features Created by FeatureTools Build Inconsistent Models

I have an imbalanced dataset which has 200 million data from class 0 and 8000 data from class 1. I followed two different approaches to build a model. Randomly sample a new dataset which has a ratio of 1:4. Meaning 32000 from class 0 and 8000 from…
1
vote
0 answers

How to encode high cardinality feature?

I have a dataset with more 1500+ category in single feature. how to encode these feature ? I tried with target encoding but there is category mismatch in training and test dataset. For example in training dataset there is A,B,C category of feature X…
1
vote
0 answers

Is there a way to use Android's Speech Recognition API to output true/false, even when the phone is off?

I'm trying to detect speech for a feature in a machine learning algorithm, but I'm having a hard time running the Speech Recognition API in the background and while the phone is off. I've tried to put the SpeechRecognizer in a service, however, the…
1
vote
1 answer

Strange output from category_encoders.TargetEncoder when there is only one row in the category (python)

I am trying to use category_encoders.TargetEncoder to encode a categorical feature. My target variable is a continuous number. However, the output from the target encoder is very strange and I could not interpret it. Could someone give me a hint on…
Yue Y
  • 583
  • 1
  • 6
  • 24
1
vote
1 answer

Aggregate features row-wise in dataframe

i am trying to create features from sample that looks like…