Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
2
votes
2 answers

python tsfresh - what does column_id argument used for

tsfresh needs input data in a specific column. I initially assumed that column_id is just row_index but I fear it's wrong. I have sensor data - pressure sensor, temperature sensor and humidity sensor being captured at 10 sec interval. Thus it's 4…
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
2
votes
1 answer

Custom Aggregation Primitives With Additional Arguments?

The transform primitive works fine with additional arguments. Here is an example def string_count(column, string=None): ''' ..note:: this is a naive implementation used for clarity ''' assert string is not None, "string to count…
2
votes
0 answers

create multi-hot SparseTensor by categorical feature array column from CSV in TensorFlow

This is a typical way of handling sparse features (such as some ID features) in recommendation system. I'm looking for a convenient way to prepare the data for TensorFlow pipeline. I did lots of search, but yet find the good solution yet. Below is…
2
votes
2 answers

Python large dataset feature engineering workflow using dask hdf/parquet

There is already a nice question about it in SO but the best answer is now 5years old, So I think there should be better option(s) in 2018. I am currently looking for a feature engineering pipeline for larger than memory dataset (using suitable…
Florian Mutel
  • 1,044
  • 1
  • 6
  • 13
2
votes
1 answer

converting dictionary to binary in python

I have a dictionary with keys as my customer ID and values as my movie id. Though the customer has watched the same movie many times, I want it to make as one. Here I need to convert my dictionary to binary data. In all the rows I need the customers…
pylearner
  • 1,358
  • 2
  • 10
  • 26
2
votes
0 answers

Duplicating pandas.get_dummies columns from train to test data

I have two dataframes, train and test. They both have the same exact column names which contain categorical string features. I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact…
Austin
  • 6,921
  • 12
  • 73
  • 138
1
vote
3 answers

How to pass only necessary features to pipeline after SelectKBest

I have a regular tabular dataset, 100 features from the database are added I want to push it into a regular sklearn.pipeline in which there will be preprocessing, encoding, some custom transformers, etc. Penultimate estimator would be…
1
vote
0 answers

Idiomatic way to update pandas dataframe when computing features for new row

I have a pandas dataframe with rows of time-series data. I want to define a function compute_features(*args **kwargs) that I can use to compute some features (20+ cols) on an existing dataframe of time-series data (5 cols) for a machine learning…
1
vote
1 answer

How to do a MinMax Scale in Snowflake column and still maintain overall sum of column?

I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should…
1
vote
3 answers

X has 1 features, but LinearRegression is expecting 10 features as input

I've seen similar questions asked here but they all seem to be caused by a different problem. I've tried reshaping and making sure it's a 2d array but i keep getting this error. Here is my code: import pandas as pd import numpy as np import…
1
vote
1 answer

Can we use Feature Engineering tools without any IDENTIFIER?

My target feature(frame strength) is not an unique value. I have train and test dataset. How can I approach to use Ft? My datasets feature are temperature, hive size, some percentile values, some entropy, different Pixel, Frame size etc.. I tried to…
HMI
  • 11
  • 1
1
vote
1 answer

Insert value in new column after comapring the same dataframe with itself

I have a dataframe # Create a sample dataframe df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2,12,1,2,3], 'date' : ['2020-03-01', '2020-01-02', '2020-01-03', '2020-01-04', '2019-01-05',…
user3585510
  • 131
  • 2
  • 10
1
vote
1 answer

Using package to perform a rolling window function with a group by

Could you use a window function on groups, something in feature engine? I have been reading the docs and trying to find some clarity on how to do this but it seems like something that should exist but I can't seem to find how its implemented. import…
user4933
  • 1,485
  • 3
  • 24
  • 42
1
vote
0 answers

Replacing One-Hot Encoding with Ordinal Encoding

As an example, I have a dataset of available games. Game A graphics presets are: Low, Medium, High, Ultra Game B graphics presets are: Minimum, Balanced, Maximum Game C graphics presets are: Ultra One-hot encoding does not correctly position…
1
vote
1 answer

How to apply mode function for some columns using agg method with groupby when aggregating using different functions for each column

I have a dataframe of numerical and categorical columns which I am trying to group by certain columns and aggregate. I am trying to apply mode function on categorical columns in a pandas dataframe and other statistical functions like sum,min..etc on…
R.A
  • 97
  • 1
  • 12