1

Am using the featuretools documentation to learn entityset and am currently getting error KeyError: 'Variable: device not found in entity' for the following piece of code:

import featuretools as ft
data = ft.demo.load_mock_customer()
customers_df = data["customers"]
customers_df
sessions_df = data["sessions"]
sessions_df.sample(5)
transactions_df = data["transactions"]
transactions_df.sample(10)
products_df = data["products"]
products_df
### Creating an entity set 
es = ft.EntitySet(id="transactions")
### Adding entities
es = es.entity_from_dataframe(entity_id="transactions", dataframe=transactions_df, index="transaction_id", time_index="transaction_time", variable_types={"product_id": ft.variable_types.Categorical})
es
es["transactions"].variables
es =  es.entity_from_dataframe(entity_id="products",dataframe=products_df,index="product_id")
es
### Adding new relationship

new_relationship = ft.Relationship(es["products"]["product_id"],
                                   es["transactions"]["product_id"]) 
es = es.add_relationship(new_relationship)
es

### Creating entity from existing table
es = es.normalize_entity(base_entity_id="transactions",
        new_entity_id="sessions",
        index = "session_id",
        additional_variables=["device",customer_id","zip_code"])

This is as per the URL - https://docs.featuretools.com/loading_data/using_entitysets.html

From the API es.normalise_entity it appears that the function would create new entity 'sessions' with index as 'session_id', and rest of the 3 variables however the error is:

C:\Users\s_belvi\AppData\Local\Continuum\Anaconda2\lib\site-packages\featuretools\entityset\entity.pyc in _get_variable(self, variable_id) 250 return v 251 --> 252 raise KeyError("Variable: %s not found in entity" % (variable_id)) 253 254 @property

KeyError: 'Variable: device not found in entity'

Do we need to create entity "sessions" separately before using es.normalize_entity? Looks like something syntactically has gone wrong in the flow, some minor mistake..

S Belvi
  • 11
  • 2

1 Answers1

0

The error here arises from device not being a column in your transactions_df. The "transactions" table referenced in that page of the documentation has more columns than demo.load_mock_customer in its dictionary form. You can find the rest of the columns using the return_single_table argument. Here's a full working example of normalize_entity which is only slightly modified from the code that you tried:

import featuretools as ft
data = ft.demo.load_mock_customer(return_single_table=True)

es = ft.EntitySet(id="Mock Customer")
es = es.entity_from_dataframe(entity_id="transactions", 
                              dataframe=data, 
                              index="transaction_id", 
                              time_index="transaction_time", 
                              variable_types={"product_id": ft.variable_types.Categorical})

es = es.normalize_entity(base_entity_id="transactions",
        new_entity_id="sessions",
        index = "session_id",
        additional_variables=["device","customer_id","zip_code"])

This will return an EntitySet with two Entities and one Relationship:

Entityset: Mock Customer
  Entities:
    transactions [Rows: 500, Columns: 8]
    sessions [Rows: 35, Columns: 5]
  Relationships:
    transactions.session_id -> sessions.session_id
Max Kanter
  • 2,006
  • 6
  • 16