Highest Voted 'feature-engineering' Questions

2

votes

1 answer

During calculation of "distance average" in knn imputation method for replacing NaN value in particular column

I encounter this problem when I implement the Knn imputation method for handling missing data from scratch. I create a dummy dataset and find the nearest neighbors for rows that contain missing values here is my dataset A B C D …

asked Aug 24 '21 at 04:20

CS Vyas

49
6

2

votes

0 answers

Putting weights on values of a categorical feature

Suppose we have the following dataset df = pd.DataFrame({'feature 1':['a','b','c','d','e'], 'feature 2':[1,2,3,4,5],'y':[1,0,0,1,1]}) as we can see feature 1 is categorical. In usual tree-based models as in XGBoost or CatBoost, the values under…

classification xgboost categorical-data feature-engineering catboost

asked Mar 25 '21 at 23:31

Wiliam

1,078
10
21

2

votes

3 answers

In a pandas column, how to find the max number of consecutive rows that a particular value occurs?

Let's say we have the following df with the column names. df = pd.DataFrame({ 'names':['Alan', 'Alan', 'John', 'John', 'Alan', 'Alan','Alan', np.nan, np.nan, np.nan, np.nan, np.nan, 'Christy', 'Christy','John']}) >>> df names 0 …

python pandas dataframe feature-engineering

asked Mar 04 '21 at 21:32

elixir

173
2
13

2

votes

0 answers

Embed row of data from dataframe into single vector or array values

Is there any way I can embed or any process to capture each of the row data turn into vector, or array number which is in shape (1,)? My intention is to embed each of the rows information become something to representative input feature, so that I…

python pandas numpy feature-engineering

asked Feb 19 '21 at 05:09

Yeo Keat

143
1
9

2

votes

1 answer

How to create binary variable for each individual based on value in other variable?

So I have a data set containing of 4 individuals. Each individual is measured for different time period. In R: df = data.frame(cbind("id"=c(1,1,1,2,2,3,3,3,3,4,4), "t"=c(1,2,3,1,2,1,2,3,4,1,2), "x1"=c(0,1,0,1,0,0,1,0,1,0,0))) and I want to create…

r dplyr tidyverse feature-engineering

asked Jan 14 '21 at 12:28

pikachu

690
1
6
17

2

votes

2 answers

deep feature synthesis depth for transformation primitives | featuretools

I am trying to use the featuretools library to make new features on a simple dataset, however, whenever I try to use a bigger max_depth, nothing happens... Here is my code so far: # imports import featuretools as ft # creating the EntitySet es =…

data-science python-3.6 feature-engineering featuretools

asked Dec 25 '20 at 14:08

MartinM

209
2
10

2

votes

2 answers

Variable creation - Inferring age

I have a grouped dataframe; Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C') OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil') Odometer <- c(1000, 1000,…

r variables inference feature-engineering data-wrangling

asked Nov 12 '20 at 16:17

Brad

580
4
19

2

votes

2 answers

Convert a column of list of dictionaries to a column list such that the values are derived from the key "name" under each dictionary in the list

The input column has a variable number of dictionary lists, it is not fixed. INPUT column: Facilities [{'name': 'Work from home', 'icon': 'WFH.svg'}] [{'name': 'Gymnasium', 'icon': 'Gym.svg'}, {'name': 'Cafeteria', 'icon': 'Cafeteria.svg'},…

python data-cleaning feature-extraction data-extraction feature-engineering

asked Nov 02 '20 at 17:26

sachin kumar s

99
3
12

2

votes

1 answer

Pandas: count identical values in columns but from different index

I have a data frame representing the customers ratings of restaurants. rating_year is the year the rating was made, first_year is the year the restaurant opened and last_year is the last business year of a restaurant. What i want to do is calculate…

python pandas dataframe jupyter-notebook feature-engineering

asked Sep 25 '20 at 20:13

Lynn

121
8
25

2

votes

1 answer

Pandas for binary classification

I have using Pandas for data processing before training a binary classifier. One of the things I could not find was a function that tells me given a value of a certain feature, let's say Age (people who are for example 60 years old) which percentage…

pandas feature-engineering

asked Aug 18 '20 at 12:35

erni

57
7

2

votes

0 answers

Boxcox transformation with tree-based models(XGBoost to be specific)

I have a question regarding boxcox transformation(or log transformation). I am working on a data-set which I have lots of skewed features. Now when I take the boxcox transformation, I get quite a nice distribution but the thing is correlation…

xgboost feature-engineering

asked Aug 13 '20 at 09:14

CheeseBurger

175
5

2

votes

1 answer

Using the column operator to check if pass or fail

I'm not sure if how can I use the operators column for me to return a pandas series where it will determine if a certain row's activity will pass or fail based from it's passing score, operator and actual. Dataset Sample: data={"ID": [1,1,2,2], …

python python-3.x pandas feature-engineering

asked Aug 11 '20 at 10:56

Maku

1,476
10
21

2

votes

1 answer

Create one new column in pandas dataframe comprised of previous year stats for each player in the dataframe

(python) I currently have a pandas dataframe that looks something like this: player | year | points | ----------------------------------------------- LeSean McCoy | 2012 | 199.3 …

python-3.x pandas dataframe feature-engineering

asked Aug 04 '20 at 17:37

ekselan

137
1
10

2

votes

1 answer

Python featuretools difference by data group

I'm trying to use featuretools to calculate time-series functions. Specifically, I'd like to subtract current(x) from previous(x) by a group-key (user_id), but I'm having trouble in adding this kind of relationship in the entityset. df =…

python data-science feature-extraction feature-engineering featuretools

asked Jul 25 '20 at 20:06

Ruslan

911
2
11
28

2

votes

2 answers

Pandas - most recent match relative to current row

I would like to add a new column to my dataframe that contains the most recent 'revenue' value where 'promotion' == 1, excluding the current row. The dataframe will always be sorted by 'day' in descending order. For rows near the bottom of the…

python pandas dataframe feature-engineering

asked Jun 23 '20 at 03:04

thatguythatdoesstuff

47
1
4

Questions tagged [feature-engineering]