Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
1
vote
1 answer

Convert list of dicts with dicts as values to ML features

I want to transform the output of Google Vision API facial recognition into a feature set for a ML classifier. For each training instance I get a list of predicted faces which is represented as a list of dictionaries where the values are themselves…
1
vote
0 answers

Using statistics to extract missing variables in a given dataset?

I would like to know which statistical approach is best suited in data science to introduce new features for a given dataset? Thanks!
Mohammad Saad
  • 1,935
  • 10
  • 28
1
vote
0 answers

Featuretools taking too long to build features without using CPU cores

I'm using featuretools Deep Feature Sintesys to build features for a dataset of 40k rows and 200 columns. I choose about 40 transformation primitivies, as you can see in the code bellow: feature_matrix, feature_defs = ft.dfs(entityset=es,…
1
vote
3 answers

Pandas groupby apply function with an array of functions

I have a dataset like this (example purpose) df = pd.DataFrame({ 'Store' : [100, 100, 100, 100, 101, 101, 101, 101], 'Product' : [5, 3, 10, 1, 3, 11, 2, 5], 'Category' : ['A', 'B', 'C', 'A', 'B', 'A', 'C', 'A'], 'Sales' : [100, 235,…
Ricky
  • 635
  • 2
  • 5
  • 20
1
vote
0 answers

How are feature interactions calculated?

How are feature interactions calculated for a pandas dataframe using python? Are there any packages/libraries to calculate feature interactions?
Ailurophile
  • 2,552
  • 7
  • 21
  • 46
1
vote
1 answer

How to get trans_primitives of highest entity in featuretools?

In the classic mock customer dataset example in featuretools, if I have to derive trans_primitives like month, day, year etc. of transaction_time attribute of transactions entity. How do I do that? import featuretools as ft es =…
Milind Dalvi
  • 826
  • 2
  • 11
  • 20
1
vote
0 answers

How to calculate the body size of candle for OHLC for comparison

Need help in understanding how to calculate in python code,the candle body from OHLC and would like to make the following classifications from the OHLC. STRONG BUY: if next candle's body is outside last candles body AND next candle body > 2x times…
1
vote
2 answers

Extract Datetime information from a string in a DataFrame column

So I have the Edition Column which contains data in unevenly pattern, as some have ',' followed by the date and some have ',-' pattern. df.head() 17 Paperback,– 1 Nov 2016 18 Mass Market Paperback,– 1 Jan 1991 19 …
1
vote
0 answers

CatBoost Post-Training Feature Information

I would like to understand how I can access information about numerical and categorical features after training a CatBoost model. For the sake of example, here's some toy code: import pandas as pd from catboost import CatBoostClassifier,…
Alex R.
  • 1,397
  • 3
  • 18
  • 33
1
vote
1 answer

When should Data Binning be used in data processing?

In data pre-processing, Data Binning is a technique to convert continuous values of a feature to categorical ones. For example, sometimes, the values of age feature in datasets are replaced with one of intervals such…
1
vote
0 answers

Why xgboost.get_booster().get_score() doesn't return any value for one of the variables?

A few questions about feature importance in xgboost in python: I'm trying to print the feature importance using xgboost.get_booster().get_score(). However, the function sometimes doesn't return anything for some variables. Does that mean the score…
1
vote
2 answers

complicated list column to column string matching and deriving another column

Dataframes: df1: ind_lst [agriculture_dairy, analytics] [architecture_planning, advertising_pr_events, analytics] df2: ind score advertising_pr_events 3.672947168 agriculture_dairy 3.368266582 airlines_aviation_aerospace 3.60798955 analytics…
user14281567
1
vote
1 answer

Use FeatureTools to aggregate monthly data from daily

I'm trying to use FeatureTools to create a dataset for use in customer churn analysis. I have a raw dataset of orders that include fields like: customer_id, order_id, order_month, order_datetime, order_cost I'd like to create a dataset that returns…
kevin.w.johnson
  • 1,684
  • 3
  • 18
  • 37
1
vote
1 answer

How to recreate new columns with column names from one column & column values from the other

I have 2 columns with list values in my data frame as shown below: salary.labels salary.percentages ['Not Impacted', 'Salary Not Paid', 'Salary Cut', 'Variables Impacted', 'Appraisal Delayed'] [29, 0.9, 2.2, 11.3, 56.6] ['Not Impacted', 'Salary…
1
vote
1 answer

Is it redundant to use df.copy() when writing a function for feature engineering?

I'm wondering if there's any benefit to writing this pattern def feature_eng(df): df1 = df.copy() ... return df1 as opposed to this pattern def feature_eng(df): ... return df
Daniel Tan
  • 135
  • 1
  • 2
  • 10