Feature crossing is a very common technique to find the nonlinear relationships in a dataset. How to use FeatureTools to generate new features by crossing features in a table?
Asked
Active
Viewed 807 times
1 Answers
2
It is possible to cross every pair of numeric features automatically in Featuretools using the primitive Multiply
. As a code example, suppose we have the fictional dataframe
index price shares_bought date
index
1 1 1.00 3 2017-12-29
2 2 0.75 4 2017-12-30
3 3 0.60 5 2017-12-31
4 4 0.50 18 2018-01-01
5 5 1.00 1 2018-01-02
and we want to multiply price
by shares_bought
. We would run
es = ft.EntitySet('Transactions')
es.entity_from_dataframe(dataframe=df, entity_id='log', index='index', time_index='date')
from featuretools.primitives import Multiply
fm, features = ft.dfs(entityset=es,
target_entity='log',
trans_primitives=[Multiply])
to make the dataframe into an entityset, and then run DFS to apply the Multiply
in all places possible. In this case, since there are only two numeric features, we'll get a feature matrix fm
which looks like
price shares_bought price * shares_bought
index
1 1.00 3 3.0
2 0.75 4 3.0
3 0.60 5 3.0
4 0.50 18 9.0
5 1.00 1 1.0
If we want to apply a primitive to a particular pair of features by hand, it is possible to do so using seed features. Our code would then be
n12_cross = Multiply(es['log']['price'], es['log']['shares_bought'])
fm, features = ft.dfs(entityset=es,
target_entity='log',
seed_features=[n12_cross])
to get the same feature matrix as above.
EDIT: To make the dataframe above, I used
import pandas as pd
import featuretools as ft
df = pd.DataFrame({'index': [1, 2, 3, 4, 5],
'shares_bought': [3, 4, 5, 18, 1],
'price': [1.00, 0.75, 0.60, 0.50, 1.00]})
df['date'] = pd.date_range('12/29/2017', periods=5, freq='D')

Seth Rothschild
- 384
- 1
- 14