2

I'm trying to use featuretools to generate a feature matrix to train on past data and predict some future data. So this is my setup:

import featuretools as ft
import pandas as pd

df_hotel = pd.DataFrame({
    'hotel_id': [1, 2],
})

df_bookings = pd.DataFrame({
    'bookings_id': [1, 2, 3, 4, 5, 6, 7, 8],
    'time': [1, 2, 3, 4, 1, 2, 3, 4],
    'hotel_id': [1, 1, 1, 1, 2, 2, 2, 2],
    'bookings': [1, 2, 3, 4, 5, 6, 7, 8]
})

es = ft.EntitySet()

es = es.entity_from_dataframe(
    entity_id='c',
    dataframe = df_bookings,
    index='bookings_id',
    time_index='time'
)

es = es.entity_from_dataframe(
    entity_id='hotels',
    dataframe=df_hotel,
    index='hotel_id'
)

es = es.add_relationship(
    ft.Relationship(
        es['hotels']['hotel_id'],
        es['bookings']['hotel_id'],
    )
)

And I generate a feature matrix as follows:

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_entity='bookings',
    cutoff_time=3,
    agg_primitives=["mean"]
)
feature_matrix

However what this gives me are two rows (where time is 4, after the curoff) where all values are NAN. The desired behaviour is to fill the values of these rows as well (but computing the aggregations based only on past data). Is this possible with featuretools?

gsmafra
  • 2,434
  • 18
  • 26

0 Answers0