0

Example:

buy_log_df = pd.DataFrame(
    [
        ["2020-01-01", 0, 1, 2, 2, 200],
        ["2020-01-02", 1, 1, 1, 3, 100],
        ["2020-01-02", 2, 2, 1, 1, 100],
        ["2020-01-03", 3, 3, 3, 1, 300],
    ],
    columns=['date', 'sale_id', 'customer_id', "item_id", "quantity", "price"]
)

es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
    "sales",
    dataframe=buy_log_df,
    index="sale_id",
    time_index='date'
)
es = es.normalize_entity(
    new_entity_id="items",
    base_entity_id="sales",
    index="item_id",
    additional_variables=["price"],
)
buy_log_df = pd.DataFrame(
    [
        ["2020-01-01", 0, 1, 2, 2],
        ["2020-01-02", 1, 1, 1, 3],
        ["2020-01-02", 2, 2, 1, 1],
        ["2020-01-03", 3, 3, 3, 1],
    ],
    columns=['date', 'sale_id', 'customer_id', "item_id", "quantity",]
)
item_df = pd.DataFrame(
    [
        [1, 100],
        [2, 200],
        [3, 300],
    ],
    columns=['item_id', 'price']
)

es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
    "sales",
    dataframe=buy_log_df,
    index="sale_id",
    time_index='date'
)
es = es.entity_from_dataframe(
    "items",
    dataframe=item_df,
    index="item_id",
)
from featuretools import Relationship
es = es.add_relationships(
    [Relationship(es['items']['item_id'], es['sales']['item_id'])],
)

It looks like the es of the above two are the same.

I'd like to know whether there is a specific case where ONLY normalize_entity() is allowed or so.

user3595632
  • 5,380
  • 10
  • 55
  • 111

1 Answers1

2

Thanks for the question. That's correct. The two entity sets are the same. There aren't cases where only normalize_entity() can be used. Changes made by this method such as adding relationships can also be done manually.

Jeff Hernandez
  • 2,063
  • 16
  • 20