Questions tagged [featuretools]

Featuretools is a Python library for automated feature engineering on relational datasets using a technique called Deep Feature Synthesis.

Featuretools is an open source python library for automated feature engineering for tabular relational datasets.

Resources

221 questions
0
votes
1 answer

Using training_window in the featuretools dfs on the nasa turbofan example returns empty features

I am trying some experiments using the Remaining Useful Life prediction example on the Turbofan Engine Degradation Simulation Data Set from NASA. I want to use a small number of data points before the cut-off time to create features and for that I…
0
votes
1 answer

How to normalize the entity having multiple values for the one feature in featuretools?

Below is an example: buy_log_df = pd.DataFrame( [ ["2020-01-02", 0, 1, 2, 2], ["2020-01-02", 1, 1, 1, 3], ["2020-01-02", 2, 2, 1, 1], ["2020-01-02", 3, 3, 3, 1], ], columns=['date', 'sale_id',…
user3595632
  • 5,380
  • 10
  • 55
  • 111
0
votes
1 answer

Do `normalize_entity()`, `add_relationships()` are logically same in featuretools?

Example: buy_log_df = pd.DataFrame( [ ["2020-01-01", 0, 1, 2, 2, 200], ["2020-01-02", 1, 1, 1, 3, 100], ["2020-01-02", 2, 2, 1, 1, 100], ["2020-01-03", 3, 3, 3, 1, 300], ], columns=['date', 'sale_id',…
user3595632
  • 5,380
  • 10
  • 55
  • 111
0
votes
1 answer

Featuretools import error in azure databricks

I would like to test the featuretools functionality in azure databricks notebooks. However getting the module error as below ModuleNotFoundError: No module named 'featuretools' Source code for featuretools as…
0
votes
1 answer

Selected primitives are incompatible with Koalas EntitySets: time_since_previous, avg_time_between, trend, avg_time_between, trend

I am using featuretools 0.20.0 and koalas 1.3.0. create feature matrix for all customers feature_matrix_cust, feature_defs = ft.dfs( enter code here`entityset=es4, target_entity="customers_ks", agg_primitives=["count", "avg_time_between",…
0
votes
1 answer

featuretools generated null columns while there should be values

Background: In using the automated feature engineering library featuretools, I first built 2 datasets member and order in the entityset. I build a relationship between them by implying ft.Relationship(es['member']['memberId'],…
0
votes
1 answer

unable to process data using featuretools

this is my data set while trying to use featuretools data Unit Price Customer Name Product Category Region Profit Quantity ordered new Sales Order ID 0 2.88 Janice Fletcher Office Supplies Central 1.320000 2 5.90 …
0
votes
1 answer

Restricting feature generation to a particular entity in FeatureTools

I'm trying to understand how to specify primitive_options in FeatureTools (version 0.16) to include only a certain entity. Based on the docs I should be using include_entities: List of entities to be included when creating features for the…
numentar
  • 1,049
  • 8
  • 21
0
votes
1 answer

featuretools dfs vs categorical_encoding

When I want to add categorical_encoding I can do it in two different ways : With dfs with setting categorical feature as relationship and getting mean/std/skew statistics . In this case categorical feature and value/s in same dataframe With…
NiMa
  • 157
  • 1
  • 12
0
votes
1 answer

(Featuretools) How are aggregate feature primitives calculated?

I have no idea how are aggregated feature primitive calculated even if I tested with very simple data. I looked out the featuretools code as well but couldn't find out where the aggregation operation happened. Here are sample codes: from…
user3595632
  • 5,380
  • 10
  • 55
  • 111
0
votes
1 answer

list of options to be used in trans_primitives and agg_primitives

I am looking for the complete list of options to be used in trans_primitives and agg_primitives. For instance, my data is not time series and for new features, I would like to try mathematical functions (add, multiply, divide,...) to create new…
0
votes
1 answer

featuretools: accumulate the unique_value groupby user with timestamp

I have the dataset like this, user_id event_name event_timestamp origin 0 1001790 deals 2020-01-01 12:07:05.089002 1 1001818 purchase 2019-10-30 09:15:38.810000 ICN 2 1001969 deals 2019-12-16…
정진아
  • 1
  • 1
0
votes
1 answer

Why does Featuretools slows down when I increase the number of Dask workers?

I'm using an Amazon SageMaker Notebook that has 72 cores and 144 GB RAM, and I carried out 2 tests with a sample of the whole data to check if the Dask cluster was working. The sample has 4500 rows and 735 columns from 5 different "assets" (I mean…
0
votes
0 answers

python, featuretools dfs Stream closed error issue

Running featuretools dfs with n_jobs > 1, sometimes give me(sometimes run without err) tornado.iostream.StreamClosedError: Stream is closed. I want to run featuretools dfs n_job >=2 without err in centos linux full stack error below: Traceback…
H Sung
  • 31
  • 2
0
votes
1 answer

How is time_since_previous computed in featuretools?

I am trying to reproduce the featuretools tutorial (See link below). I am using the mocking data provided in the package. They include a customers table and a sessions table. Every customer has many sessions. Every session has a session_start…