Highest Voted 'feature-engineering' Questions

1

vote

0 answers

How to feed key-value features (aggregated data) to LSTM?

I have the following time-series aggregated input for an LSTM-based model: x(0): {y(0,0): {a(0,0), b(0,0)}, y(0,1): {a(0,1), b(0,1)}, ..., y(0,n): {a(0,n), b(0,n)}} x(1): {y(1,0): {a(1,0), b(1,0)}, y(1,1): {a(1,1), b(1,1)}, ..., y(1,n): {a(1,n),…

asked May 17 '20 at 23:39

Maximus

471
1
10
25

1

vote

1 answer

Pandas qcut apply on new data result in NaN

I am binning for a modelling project and I ran into this problem. This example acquire bins using dataframe without 11, this result in a NaN when bins is applied to a new dataframe with 11 in it. Obviously this will happen, but I wonder if there…

python pandas dataframe feature-engineering

asked May 14 '20 at 09:23

noodle cold

39
4

1

vote

0 answers

Event driven approach to update dependency files needed for calculating features in production system

I have a production system use-case where my controller code depends on some external files (Metadata information of some relevant business logic; 3-5 JSON files which in total would amount to 1GB of data) which gets updated frequently to create…

amazon-web-services spring-boot architecture event-handling feature-engineering

asked Apr 25 '20 at 06:31

here_to_learn

179
2
11

1

vote

1 answer

Hash trick in sklearn FeatureHasher

Wanting to understand "the hashing trick" I've written the following test code: import pandas as pd from sklearn.feature_extraction import FeatureHasher test = pd.DataFrame({'type': ['a', 'b', 'c', 'd', 'e','f','g','h']}) h =…

python machine-learning scikit-learn feature-engineering

asked Apr 19 '20 at 10:03

Roni Gadot

437
2
19
30

1

vote

2 answers

preserving order information in a single feature

The following is one column of a dataset that I'm trying to feature engineer: +---+-----------------------------+ |Id |events_list | +---+-----------------------------+ |1 |event1,event3,event2,event1 …

machine-learning dataset feature-engineering

asked Apr 15 '20 at 12:51

Shlomi Schwartz

8,693
29
109
186

1

vote

0 answers

When creating a new feature of similarity in ham vs spam case, should I include the similarity of spam with itself in the average of samp similarity?

I want to improve my model by adding a new feature column to my data, the data of ham and spam texts. I have already created the square Cosine similarity matrix between all the texts, the diagonal of the matrix are 1s = cos(0). I extract all the…

nlp feature-extraction feature-engineering

asked Apr 08 '20 at 23:43

yshi50

11
2

1

vote

1 answer

Featuretools: Using features calculated in train data on new data

I was wondering how to use features developed in train time for prediction on new data. The dataset in question is the appointment cancellation dataset from Predict appointment no show, Github Consider the feature locations.PERCENT_TRUE(no_show):…

python-3.x feature-extraction feature-engineering featuretools

asked Mar 17 '20 at 10:23

Arun

180
11

1

vote

1 answer

Handling a missing value in machine learning

I was analyzing a dataset in which i have column names as follows: [id , location, tweet, target_value]. I want to handle the missing values for column location in some rows. So i thought to extract location from tweet column from that row(if tweet…

machine-learning statistics data-science missing-data feature-engineering

asked Mar 07 '20 at 19:48

Deepak Chaudhary

93
1
8

1

vote

1 answer

How do I convert topics for each item in the dataset into a feature vector, considering that each item can have more than 1 topic

I have a dataset which contains english statements. Each statement has been assigned a number of topics that the statement is about. The topics could be economy, sports, politics, business, science, etc. Each statement can have more than 1 topic.…

python machine-learning feature-extraction feature-selection feature-engineering

asked Feb 23 '20 at 11:13

Saad Farooq

39
1
8

1

vote

1 answer

What is the proper way of using featuretools for single table data?

Assume that I have a dataset consisting of single table, for instance you can consider titanic dataset on kaggle. Now what is a proper way of using feature tools to get most benefit from it? as featuretools is specially for relational data. now by…

data-science feature-selection feature-engineering featuretools

asked Feb 21 '20 at 18:21

Graphics Engineer

95
1
1
7

1

vote

0 answers

Broadcast error when using autofeat for automated feature engineering

When trying to use autofeat(https://github.com/cod3licious/autofeat) to automatically generate new features, I am receiving the following error: operands could not be broadcast together with shapes (963,) (962,) simple code: model =…

data-science kaggle feature-engineering

asked Feb 21 '20 at 16:50

Graphics Engineer

95
1
1
7

1

vote

1 answer

Compute combination of a pair variables for a given operation in R

From a given dataframe: # Create dataframe with 4 variables and 10 obs set.seed(1) df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE))) I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one…

r feature-engineering

asked Feb 17 '20 at 10:37

PeCaDe

277
1
8
33

1

vote

1 answer

How to create new variables by multiple ids in featuretools?

I have a dataset that has one row per member and per transaction, and there are different stores the purchase could have came from 'brand_id'. I want to use featuretools to make output that would have one row per member, with an aggregate of…

python pandas group-by feature-engineering featuretools

asked Feb 13 '20 at 20:47

Nate Thompson

625
1
7
22

1

vote

1 answer

Is it a bad idea to use the cluster ID from clustering text data using K-means as feature to your supervised learning model?

I am building a model that will predict the lead time of products flowing through a pipeline. I have a lot of different features, one is a string containing a few words about the purpose of the product (often abbreviations, name of the application…

machine-learning nlp cluster-analysis supervised-learning feature-engineering

asked Feb 09 '20 at 15:14

kspr

980
9
23

1

vote

0 answers

KeyError: 'Entity c does not exist in dfs'

when i try to run this code, ftr_mtrx_custmr, features_defs = ft.dfs(entities=entities, relationships=relationship, target_entity="transactions") i get such error， 490…

feature-extraction feature-engineering

asked Feb 09 '20 at 03:55

Ron

11
1

Questions tagged [feature-engineering]