Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work

481 questions
4
votes
1 answer

How to use OrdinalEncoder() to set custom order?

I have a column in my Used cars price prediction dataset named "Owner_Type". It has four unique values which are ['First', 'Second', 'Third', 'Fourth']. Now the order that makes the most sense is First > Second > Third > Fourth as the price…
4
votes
2 answers

Pandas: How to calculate the average of a groupby

I have a csv file containing few attributes and one of them is the star ratings of different restaurants etoiles (means star in french). Here annee means the year when the rating was made. note: I dont know how to share a Jupyter notebook table…
Lynn
  • 121
  • 8
  • 25
4
votes
1 answer

How to choose or optimize the labels so that we get better multiclass classification results?

Recently I was working on a Kaggle project "Prudential Life Insurance Assessment" where the competitors talk about changing the labels so as to get the better metric. In that particular competition, the target has 8 classes (1-8), but one of the guy…
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
4
votes
1 answer

Tensorflow One Hot Encoding - Could not find valid device for node

During my feature engingeering the following error occurred. My featurelist has 21 sublists with each 8537 values being either 0 or 1. When trying to run the One Hot Encoding via tensorflow it shows the error Could not find valid device for…
hux0
  • 207
  • 1
  • 4
  • 17
4
votes
2 answers

How to use dateparser to detect dates in strings?

I want to use dateparser to detect which cell contains a date. I have a broad range of different date formats: Fr, 21.02.2020 // 20.02.2020 // 21.02 // 21-02-2020 // January, 21 2020 // 21-Jan-2020 // 21/02/20 and I am sure there will still come a…
hux0
  • 207
  • 1
  • 4
  • 17
4
votes
4 answers

R: How to generate a column with row values based on the nearest N row's values

I'm looking for a way to code a column based information in the previous N rows to a given row. The dataset is sorted. In short, I want to create a column called oneweeksince that returns TRUE if the victims column is greater than 0 (or !NA) for…
Union find
  • 7,759
  • 13
  • 60
  • 111
4
votes
1 answer

Featuretools categorical handling

Featuretools offers integrated functionality to handle categorical variables variable_types={"product_id": ft.variable_types.Categorical} https://docs.featuretools.com/loading_data/using_entitysets.html However should these be strings or…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
2 answers

ESP8266 Wifi configuration

I had a Node MCU module as shown in the photo. My nodemcu I had this tutorial from this link:Tutorial link I had a problem with WIFI Configuring which from the code I had online it should be working with the browser 192.168.4.1 as shown as below.…
3
votes
2 answers

Databricks Notebook 8.3 (Apache Spark 3.1.1, Scala 2.12) | pyspark | Parquet write exception | Multiple failures in stage materialization

This is a Production code running fine until last week. Then, this parquet write error showed up and never getting resolved. While writing to AWS S3 in parquet format, I tried several dataframe.repartitions(300) - 300, 500, 2400, 6000. But no luck.…
3
votes
2 answers

How do I get feature importances for decision tree pipeline that has preprocessing and classification steps?

I'm trying to fit Decision Tree model on UCI Adult dataset. I built the following pipeline to do so: nominal_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex',…
3
votes
1 answer

Problem with negative numbers in sklearn.feature_selection.SelectKBest feautre scoring module

I was trying auto feature engineering and selecting, so for that, I used the Boston house price dataset available in sklearn. from sklearn.datasets import load_boston import pandas as pd data = load_boston() x = data.data y= data.target y =…
3
votes
4 answers

How to filter a column by greater than considering an index

I have a data frame representing the customers ratings of restaurants. star_rating is rating of the customer in this data frame. What i want to do is to add a column nb_fave_rating in the same data frame that represents the total number of…
Lynn
  • 121
  • 8
  • 25
3
votes
1 answer

Sagemaker - Random Cut Forest - Feature Normalization? Pre-Processing?

I am having trouble understanding the RCF algorithm, particularly how it expects / anticipates data or the pre-processing that should be completed? For example, I have the following data/features (with example values) for about 500K records): …
theStud54
  • 705
  • 1
  • 8
  • 19
3
votes
1 answer

Target Encoding : Fill NaN generated in expanding mean encoded values

I am working on multi-class classification problem having five classes in the target column. I have generated features for categorical variables using expanding mean encoding(Target encoding). The method is based on encoding categorical variable…
joel
  • 1,156
  • 3
  • 15
  • 42
3
votes
0 answers

Specifying interesting_variables with featuretools does not work

I'm currently working through the feature tools docs using my own data. So far everything worked fine but I got stuck at adding interesting variables. For some reason, I can't make it work and I am not sure why. The example in the doc works just…
1
2
3
31 32