You do not need an unique identifier in your dataset to use Featuretools. You can tell Featuretools to make an index column.
You can set make_index
to True in your call to add_dataframe to create a new index on that data - make_index
creates a unique index for each row by just looking at what number the row is, in relation to all the other rows. The name of the new index is controlled from the index
parameter.
product_df = pd.DataFrame({"product": [1, 2, 3, 4, 4],
"rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df
es = ft.EntitySet(id="product_data")
es = es.add_dataframe(dataframe_name="products",
dataframe=product_df,
make_index=True,
index="id")
es["products"]
- If you look at the
products
table in the EntitySet, you will see the newly created index column.