Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
3
votes
1 answer

how to use ft.dfs result join to test set?

I know featuretools has ft.calculate_feature_matrix method, but it calculate data use test. I need when I get the feature use train data,and join to test data not use the same feature on test data. for example: train data: id sex score 1 f 100 2 f…
3
votes
1 answer

Calculating same features with multiple training windows in Featuretools

Featuretools supports already handling of multiple cutoff times https://docs.featuretools.com/automated_feature_engineering/handling_time.html In [20]: temporal_cutoffs = ft.make_temporal_cutoffs(cutoffs['customer_id'], ....: …
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
1 answer

KMeans clustering unbalanced data

I have a set of data with 50 features (c1, c2, c3 ...), with over 80k rows. Each row contains normalised numerical values (ranging 0-1). It is actually a normalised dummy variable, whereby some rows have only few features, 3-4 (i.e. 0 is assigned if…
3
votes
1 answer

Learning names of spammers

Currently, some spam waves, especially when sport events happen, are flooding the internet. As I strongly doubt that the usernames of the spammers aren't computer generated, I thought it might be interesting to try learning spammer names…
user3001
  • 3,437
  • 5
  • 28
  • 54
2
votes
1 answer

Is there a way to access feature names/labels from a keras model alone?

I'm trying to retrieve feature names from a Keras model in a generalized way. I want to load a pretrained model and obtain its feature names, like this: labels = model.get_feature_names() I'm looking for something that works with any Keras model,…
2
votes
0 answers

Can anyone explain about "Formula Value" of v1 ,v2 ​ in PredictionValuesChange of Feature importance in CatBoost? what is Formula Value?

what is Formula Value? --> "Formula Value" of v1 ,v2 ​ in PredictionValuesChange of Feature importance in CatBoost? to find out what meaning of Formula Value ? calculate from what criteria?
2
votes
2 answers

Pandas New Column from Conditions Across Multiple Columns

This is stumping me with pandas.. I have a data frame with 5.8M rows and a date index. I 5 columns A, B, C, D & E and would simply like to create a new column F_Score based on simple math below: F_Score=0 if A > = B: F_Score = 1.0 else: …
2
votes
1 answer

Vertex AI feature store vs BigQuery

I was trying to figure out key differences between using GCP Vertex AI feature store and Saving preprocessed features to BigQuery and loading whenever it gets necessary. I still cannot understand why to choose the first option, rather than the…
2
votes
1 answer

TextVectorization and Autoencoder for feature extraction of text

I'm trying to solve a problem which is as follows: I need to train the autoencoder to extract useful data from text. I will use the trained autoencoder in another model to extract features. The goal is to teach the autocoder to compress the…
2
votes
1 answer

Select the features with positive contribution to each class using SHAP values

I am trying to get the features which are important for a class and have a positive contribution (having red points on the positive side of the SHAP plot). I can get the shap_values and plot the shap summary for each class (e.g. class 2 here) using…
Sali
  • 77
  • 1
  • 8
2
votes
0 answers

Calculating the average of the (week, weekend, weekdays, month or year) from any (days, weeks, months or years) ago for evey TS value

The original DF has only the index (Datetime[ns]) and the objective variable. I have first added a few columns extracting information from the 'dates' index df['dayofweek'] = df['date'].dt.dayofweek df['quarter'] = df['date'].dt.quarter df['month']…
2
votes
2 answers

feature-engine: cross-validation gives error when wrapping OneHotEncoder in SklearnTransformerWrapper

Issue I am using the feature-engine library, and am finding that when I create an sklearn Pipeline that uses the SklearnTransformerWrapper to wrap a OneHotEncoder, I get the following error when trying to run cross-validation: ValueError: Input…
sparc_spread
  • 10,643
  • 11
  • 45
  • 59
2
votes
0 answers

How to deal with external regressors in time series recipes?

In time series forecasting external regressors can make a big difference. Currently I want to track the effects of external regressors, using the modeltime framework. However, I could not find any helpful information on this topic so far. I only…
2
votes
1 answer

Implementing sklearn PCA on limited number of variables in a pipeline

I'm setting up a machine learning pipeline to classify some data. One source of the data is a very good candidate for PCA and makes up the last n dimensions of the dataset. I would like to use PCA on these variables but not the preceding variables.…
2
votes
1 answer

How to use K means clustering to visualise learnt features of a CNN model?

Recently I was going through the paper : "Intriguing Properties of Contrastive Losses"(https://arxiv.org/abs/2011.02803). In the paper(section 3.2) the authors try to determine how well the SimCLR framework has allowed the ResNet50 Model to learn…
1 2
3
31 32