Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
1
vote
1 answer

How to INCLUDE certain pre-processing step into model for Tensorflow serving

I have built a model with different features. For the preprocessing I have used mainly feature_columns. For instance, for bucketizing GEO information or for embedding categorical data with a large amount of different values. Additionally, I had to…
1
vote
1 answer

How do I add external features to my pipeline?

There is a similar question asked here on SO many years back but there was no answer. I have the same question. I would like to add in new column(s) of data, in my case 3 columns for dummy variables, to a sparse matrix (from TfidfVectorizer), before…
1
vote
1 answer

NumPy error while implementing feature engineering functions on array

In the below-given code, I was implementing autofeat library for feature engineering. But when I ran the code in google colab, It gave an error, as stated below. import autofeat as af from sklearn.datasets import load_boston data = load_boston() x =…
Samar Pratap Singh
  • 471
  • 1
  • 10
  • 29
1
vote
2 answers

creating new features with certain percentile of price

I am working on a forex classification problem, need help with creating the below-detailed features, I have shared my code below and also attached pic for a visual reference of the issue at hand. Feature: opensimilarclose (1 if open = close plus or…
1
vote
1 answer

Pandas: How to create a new column in a Dataframe and add values in it considering other existing columns

I have a data frame representing some restaurants and their names. What i want to do is to add a column is_chain to my initial Dataframe df that represents if the restaurant is a food chain or not. This new column Takes 0 or 1. The value 1…
Lynn
  • 121
  • 8
  • 25
1
vote
1 answer

Pandas: calculate the std of total column value per "year"

I have a data frame representing the customers checkins (visits) of restaurants. year is simply the year when a checkin in a restaurant happened . What i want to do is to add a column std_checkin to my initial Dataframe df that represents the…
1
vote
1 answer

Pandas: Filter correctly Dataframe columns considering multiple conditions

I have a data frame representing the customers ratings of restaurants. star_rating is rating of the customer in this data frame. What i want to do is to add a column nb_favorables_mention in the same data frame that represents The total number of…
Lynn
  • 121
  • 8
  • 25
1
vote
1 answer

How do I create features in featuretools for rows with the same id and a time index?

I have a Dataframe like this data = {'Customer':['C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C3', 'C3'], 'NumOfItems':[3, 2, 4, 5, 5, 6, 10, 6, 14], 'PurchaseTime':["2014-01-01", "2014-01-02", "2014-01-03","2014-01-01", "2014-01-02",…
1
vote
0 answers

Will Machine learning model work with X as Sparse matrix

I had to encode 7 features in One Hot, thus it created sparse matrix as a result. my questions are: Since I cannot see the actual data behind sparse matrix, I had to scale them first because indexes got messed up. is there any way around it by…
1
vote
1 answer

How to create a preprocessing pipeline including built-in scikit learn transformers, custom transformers, one of which is for feature engineering?

I am using this dataset: https://www.kaggle.com/shahir/protein-data-set Summary I am struggling to create a preprocessing pipeline with built-in transformers and custom transformers that would include a one that would add additional attributes to…
1
vote
1 answer

How to do feature engineering on anonymous dataset?

I have a task to perform classification on a data set in which the feature names are anonymous like var1,var2 etc. There is mixed continuous and one hot encoded features. I am bit stuck how to make new features out of the existing features and…
1
vote
0 answers

How to convert tree-stuctrued feature to vector feature

I have a data set with several features, one of those features are categorical but have tree structure on its value. For example, if this categorical features have value a, b, c, d, e, f, g, h, I, j, k. then following image reveal the tree…
Rui
  • 117
  • 2
  • 12
1
vote
0 answers

PySpark Feature Transformation: QuantileTransformer with uniform distribution of the output

Link to the document on scikit-learn: link What it essentially does is, it normalizes the data such that each data point falls under a bucket between 0 and 1 (percentile rank?) and I assume each of these buckets would have equal number of data…
TazA
  • 29
  • 1
  • 3
1
vote
0 answers

Impact of negative correlation on categorical data?

PS: I am a student of Data Science, I was wondering the impact of correlation on categorical data. Let say I have 2 features such as Ticket Class with 1,2,3 (class 3 is lower than class 1) as a category and Seat Numbers as A,B,C,D,E,F & N (where N…
sam
  • 21
  • 2
1
vote
1 answer

Feature Selection in Machine Learning Question

I am trying to predict y, a column of 0s and 1s (classification), using features (X). I'm using ML models like XGBoost. One of my features, in reality, is highly predictive, let's call it X1. X1 is a column of -1/0/1. When X1 = 1, 80% of the time y…