Questions tagged [featuretools]

Featuretools is a Python library for automated feature engineering on relational datasets using a technique called Deep Feature Synthesis.

Featuretools is an open source python library for automated feature engineering for tabular relational datasets.

Resources

221 questions
2
votes
0 answers

Compute future features with featuretools

I'm trying to use featuretools to generate a feature matrix to train on past data and predict some future data. So this is my setup: import featuretools as ft import pandas as pd df_hotel = pd.DataFrame({ 'hotel_id': [1, 2], }) df_bookings =…
gsmafra
  • 2,434
  • 18
  • 26
2
votes
0 answers

How to speed up featuretools dfs execution?

I am running featuretools to create new features and have created the entitysets from existing dataframe. The dataframe for training has ~233K records and 81 columns which is split into 3 entities and provided as an input argument to es.dfs command…
Ganesh Bhat
  • 295
  • 7
  • 20
2
votes
1 answer

Lexical Error while running featuretoolsR

In an effort to test the working of featuretools, I installed featuretoolsR through RStudio,and installed numpy and featuretools in Python. However on trying to create an entitiy following error is coming #…
Avik Das
  • 21
  • 2
2
votes
1 answer

How to use Featuretools to create features for a single table with no immediate features?

I used the answer from @willk, but it pops up an error. see willk's answer here. willk's anser I cannot make a comment in his answer because I don't have enough reputation(Over 50). So my question is how to make the code below work? Or please…
UpcaseM
  • 37
  • 5
2
votes
0 answers

Writing a dask bag of data frame to disk (Generating 2 million features with dask and featuretools)

I'm very new to both Dask and Featuretools so I'm having alot of difficulties combining them to parallelize feature engineering Short version: solving an immediate problem I have a dask bag dfs of pandas DataFrame and want to output them as csv with…
2
votes
1 answer

How to create interesting values using value combinations from multiple features/columns

I am fairly new to featuretools, and trying to understand if and how one can add interesting values to an entity set generated using multiple features. For example, I have an entity set with two entities: customers and transactions. Transactions can…
LLT101618
  • 23
  • 2
2
votes
0 answers

Is there a way to get the percentage for each level of a categorical variable in an entityset?

Right now, for a categorical variable with levels A, B, and C, I can only get the mode for each user id. I'd also like to get the percentage of values for each level for each user id. For example, using encode_features, I get that user1 has the…
Ray
  • 53
  • 6
2
votes
1 answer

How to use Featuretools to create features from multiple columns in single dataframe by column values?

I'm trying to predict results of football matches based on earlier results. I'm running Python 3.6 on Windows and using Featuretools 0.4.1. Let's say I have the following dataframe representing history of results. Original DataFame Using the…
Mtale
  • 33
  • 5
2
votes
1 answer

No module named 'featuretools.features' error pip install in Jupyter notebook

When I try to install anything from featuretools.features using pip in a Jupyter notebook, I get this error: ModuleNotFoundError: No module named 'featuretools.features' Everything else I'm importing from featuretools is working, so I'm not sure…
Ray
  • 53
  • 6
2
votes
1 answer

FeatureTools: Dealing with many-to-many relationships

I have a dataframe of purchases with multiple columns, including the three below: PURCHASE_ID (index of purchase) WORKER_ID (index of worker) ACCOUNT_ID (index of account) A worker can have multiple accounts associated to them, and an account…
LEJ
  • 1,868
  • 4
  • 16
  • 24
2
votes
1 answer

How to get a list of column names in featuretools

How can I get a list of column names in featuretools. in pandas data frames I just type this code dataframe.columns that returns a list of columns names however, I tried to do it in an entity set and failed. should I convert the entity set to data…
2
votes
1 answer

"IndexError: Too many levels" when running Featuretools dfs after upgrade

Featuretools' dfs() method fails to run on my entity set after upgrading from v0.1.21 to v0.2.x and v0.3.0. The error is raised when the Pandas backend tries to calculate the aggregate features _calculate_agg_features(). In particular: --> 442…
J. Kinley
  • 21
  • 1
  • 3
2
votes
1 answer

Calculate features at multiple training windows in Featuretools

I have a table with customers and transactions. Is there a way how to get features that would be filtered for last 3/6/9/12 months? I would like to automatically generate features: number of trans in last 3 months .... number of trans in last 12…
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
2
votes
1 answer

Select amount of past data when calculating features

I'm wondering if there is a way to automatically select the amount of past data when calculating features. For example, I might want to predict when a customer is going to make their next purchase, so it would be good to know a count of purchases or…
tsp
  • 43
  • 3
2
votes
1 answer

cutoff time and training window at featuretools

Suppose I have two datasets (corresponding to two entities in my entityset): First one: customers (cust_id, name, birthdate, customer_since) Second one: bookings (booking_id, service, chargeamount, booking_date) Now I want to create a dataset with…
Flo
  • 233
  • 2
  • 6
1 2
3
14 15