Highest Voted 'feature-engineering' Questions

4

votes

1 answer

How to use OrdinalEncoder() to set custom order?

I have a column in my Used cars price prediction dataset named "Owner_Type". It has four unique values which are ['First', 'Second', 'Third', 'Fourth']. Now the order that makes the most sense is First > Second > Third > Fourth as the price…

asked May 09 '22 at 11:05

Vipul Sarode

43
1
4

4

votes

2 answers

Pandas: How to calculate the average of a groupby

I have a csv file containing few attributes and one of them is the star ratings of different restaurants etoiles (means star in french). Here annee means the year when the rating was made. note: I dont know how to share a Jupyter notebook table…

python pandas jupyter-notebook feature-engineering

asked Sep 23 '20 at 17:03

Lynn

121
8
25

4

votes

1 answer

How to choose or optimize the labels so that we get better multiclass classification results?

Recently I was working on a Kaggle project "Prudential Life Insurance Assessment" where the competitors talk about changing the labels so as to get the better metric. In that particular competition, the target has 8 classes (1-8), but one of the guy…

python pandas machine-learning xgboost feature-engineering

asked Jun 20 '20 at 19:49

BhishanPoudel

15,974
21
108
169

4

votes

1 answer

Tensorflow One Hot Encoding - Could not find valid device for node

During my feature engingeering the following error occurred. My featurelist has 21 sublists with each 8537 values being either 0 or 1. When trying to run the One Hot Encoding via tensorflow it shows the error Could not find valid device for…

python tensorflow keras one-hot-encoding feature-engineering

asked May 29 '20 at 13:34

hux0

207
1
4
17

4

votes

2 answers

How to use dateparser to detect dates in strings?

I want to use dateparser to detect which cell contains a date. I have a broad range of different date formats: Fr, 21.02.2020 // 20.02.2020 // 21.02 // 21-02-2020 // January, 21 2020 // 21-Jan-2020 // 21/02/20 and I am sure there will still come a…

python datetime parsing feature-engineering dateparser

asked Apr 29 '20 at 06:29

hux0

207
1
4
17

4

votes

4 answers

R: How to generate a column with row values based on the nearest N row's values

I'm looking for a way to code a column based information in the previous N rows to a given row. The dataset is sorted. In short, I want to create a column called oneweeksince that returns TRUE if the victims column is greater than 0 (or !NA) for…

r dataframe dplyr feature-engineering

asked Mar 26 '19 at 02:54

Union find

7,759
13
60
111

4

votes

1 answer

Featuretools categorical handling

Featuretools offers integrated functionality to handle categorical variables variable_types={"product_id": ft.variable_types.Categorical} https://docs.featuretools.com/loading_data/using_entitysets.html However should these be strings or…

python pandas feature-extraction feature-engineering featuretools

asked Sep 23 '18 at 05:48

Georg Heiler

16,916
36
162
292

3

votes

2 answers

ESP8266 Wifi configuration

I had a Node MCU module as shown in the photo. My nodemcu I had this tutorial from this link:Tutorial link I had a problem with WIFI Configuring which from the code I had online it should be working with the browser 192.168.4.1 as shown as below.…

arduino wifi microcontroller esp8266 feature-engineering

asked Feb 11 '23 at 03:52

AYW

31
2

3

votes

2 answers

Databricks Notebook 8.3 (Apache Spark 3.1.1, Scala 2.12) | pyspark | Parquet write exception | Multiple failures in stage materialization

This is a Production code running fine until last week. Then, this parquet write error showed up and never getting resolved. While writing to AWS S3 in parquet format, I tried several dataframe.repartitions(300) - 300, 500, 2400, 6000. But no luck.…

apache-spark pyspark databricks feature-engineering aws-databricks

asked Dec 27 '21 at 02:49

Michelle_G

33
1
4

3

votes

2 answers

How do I get feature importances for decision tree pipeline that has preprocessing and classification steps?

I'm trying to fit Decision Tree model on UCI Adult dataset. I built the following pipeline to do so: nominal_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex',…

python scikit-learn pipeline decision-tree feature-engineering

asked Oct 18 '21 at 02:25

chesslad

31
3

3

votes

1 answer

Problem with negative numbers in sklearn.feature_selection.SelectKBest feautre scoring module

I was trying auto feature engineering and selecting, so for that, I used the Boston house price dataset available in sklearn. from sklearn.datasets import load_boston import pandas as pd data = load_boston() x = data.data y= data.target y =…

python-3.x scikit-learn feature-extraction sklearn-pandas feature-engineering

asked Oct 29 '20 at 17:42

Samar Pratap Singh

471
1
10
29

3

votes

4 answers

How to filter a column by greater than considering an index

I have a data frame representing the customers ratings of restaurants. star_rating is rating of the customer in this data frame. What i want to do is to add a column nb_fave_rating in the same data frame that represents the total number of…

python pandas dataframe feature-engineering

asked Sep 25 '20 at 21:41

Lynn

121
8
25

3

votes

1 answer

Sagemaker - Random Cut Forest - Feature Normalization? Pre-Processing?

I am having trouble understanding the RCF algorithm, particularly how it expects / anticipates data or the pre-processing that should be completed? For example, I have the following data/features (with example values) for about 500K records): …

scikit-learn amazon-sagemaker feature-engineering

asked Dec 05 '19 at 19:04

theStud54

705
1
8
19

3

votes

1 answer

Target Encoding : Fill NaN generated in expanding mean encoded values

I am working on multi-class classification problem having five classes in the target column. I have generated features for categorical variables using expanding mean encoding(Target encoding). The method is based on encoding categorical variable…

python machine-learning data-science feature-engineering

asked Jan 27 '19 at 04:37

joel

1,156
3
15
42

3

votes

0 answers

Specifying interesting_variables with featuretools does not work

I'm currently working through the feature tools docs using my own data. So far everything worked fine but I got stuck at adding interesting variables. For some reason, I can't make it work and I am not sure why. The example in the doc works just…

python machine-learning feature-engineering featuretools

asked Oct 05 '18 at 21:52

FaV1

65
5

Questions tagged [feature-engineering]