Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work
Questions tagged [feature-engineering]
481 questions
2
votes
2 answers
python tsfresh - what does column_id argument used for
tsfresh needs input data in a specific column. I initially assumed that column_id is just row_index but I fear it's wrong.
I have sensor data - pressure sensor, temperature sensor and humidity sensor being captured at 10 sec interval. Thus it's 4…

joel.wilson
- 8,243
- 5
- 28
- 48
2
votes
1 answer
Custom Aggregation Primitives With Additional Arguments?
The transform primitive works fine with additional arguments. Here is an example
def string_count(column, string=None):
'''
..note:: this is a naive implementation used for clarity
'''
assert string is not None, "string to count…

Jeff Hernandez
- 2,063
- 16
- 20
2
votes
0 answers
create multi-hot SparseTensor by categorical feature array column from CSV in TensorFlow
This is a typical way of handling sparse features (such as some ID features) in recommendation system. I'm looking for a convenient way to prepare the data for TensorFlow pipeline.
I did lots of search, but yet find the good solution yet.
Below is…

fengda
- 39
- 3
2
votes
2 answers
Python large dataset feature engineering workflow using dask hdf/parquet
There is already a nice question about it in SO but the best answer is now 5years old, So I think there should be better option(s) in 2018.
I am currently looking for a feature engineering pipeline for larger than memory dataset (using suitable…

Florian Mutel
- 1,044
- 1
- 6
- 13
2
votes
1 answer
converting dictionary to binary in python
I have a dictionary with keys as my customer ID and values as my movie id. Though the customer has watched the same movie many times, I want it to make as one.
Here I need to convert my dictionary to binary data.
In all the rows I need the customers…

pylearner
- 1,358
- 2
- 10
- 26
2
votes
0 answers
Duplicating pandas.get_dummies columns from train to test data
I have two dataframes, train and test. They both have the same exact column names which contain categorical string features.
I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact…

Austin
- 6,921
- 12
- 73
- 138
1
vote
3 answers
How to pass only necessary features to pipeline after SelectKBest
I have a regular tabular dataset, 100 features from the database are added
I want to push it into a regular sklearn.pipeline in which there will be preprocessing, encoding, some custom transformers, etc.
Penultimate estimator would be…

Nikitosiwe
- 33
- 6
1
vote
0 answers
Idiomatic way to update pandas dataframe when computing features for new row
I have a pandas dataframe with rows of time-series data.
I want to define a function compute_features(*args **kwargs) that I can use to compute some features (20+ cols) on an existing dataframe of time-series data (5 cols) for a machine learning…

harryjulian
- 29
- 1
- 3
1
vote
1 answer
How to do a MinMax Scale in Snowflake column and still maintain overall sum of column?
I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should…

Paulo Masnik
- 13
- 4
1
vote
3 answers
X has 1 features, but LinearRegression is expecting 10 features as input
I've seen similar questions asked here but they all seem to be caused by a different problem. I've tried reshaping and making sure it's a 2d array but i keep getting this error. Here is my code:
import pandas as pd
import numpy as np
import…

Johnny
- 59
- 5
1
vote
1 answer
Can we use Feature Engineering tools without any IDENTIFIER?
My target feature(frame strength) is not an unique value. I have train and test dataset. How can I approach to use Ft? My datasets feature are temperature, hive size, some percentile values, some entropy, different Pixel, Frame size etc..
I tried to…

HMI
- 11
- 1
1
vote
1 answer
Insert value in new column after comapring the same dataframe with itself
I have a dataframe
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2,12,1,2,3],
'date' : ['2020-03-01', '2020-01-02', '2020-01-03',
'2020-01-04', '2019-01-05',…

user3585510
- 131
- 2
- 10
1
vote
1 answer
Using package to perform a rolling window function with a group by
Could you use a window function on groups, something in feature engine? I have been reading the docs and trying to find some clarity on how to do this but it seems like something that should exist but I can't seem to find how its implemented.
import…

user4933
- 1,485
- 3
- 24
- 42
1
vote
0 answers
Replacing One-Hot Encoding with Ordinal Encoding
As an example, I have a dataset of available games.
Game A graphics presets are: Low, Medium, High, Ultra
Game B graphics presets are: Minimum, Balanced, Maximum
Game C graphics presets are: Ultra
One-hot encoding does not correctly position…

MischievousChild
- 129
- 10
1
vote
1 answer
How to apply mode function for some columns using agg method with groupby when aggregating using different functions for each column
I have a dataframe of numerical and categorical columns which I am trying to group by certain columns and aggregate.
I am trying to apply mode function on categorical columns in a pandas dataframe and other statistical functions like sum,min..etc on…

R.A
- 97
- 1
- 12