0

The following error arises when trying to add a relationship between two entities in Featuretools

Unable to add relationship because ID in metadata is Pandas `dtype category` and ID in transactions is Pandas `dtype category`

Note, the Series are not necessarily the same cat.Codes

Max Kanter
  • 2,006
  • 6
  • 16
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

1 Answers1

3

This error arises because the categories are different between the categorical variables you are trying to relate. In the code example below, all 3 series are categoricals, but only s and s2 have the same dtype.

import pandas as pd
from pandas.api.types import is_dtype_equal

s = pd.Series(["a","b","a"], dtype="category")
s2 = pd.Series(["b","b","a"], dtype="category")
s3 = pd.Series(["a","b","c"], dtype="category")

is_dtype_equal(s.dtype, s2.dtype) # this is True
is_dtype_equal(s.dtype, s3.dtype) # this is False

To fix this, you need update your dataframe before loading it into Featuretools to make sure the Pandas Categoricals have the same values category values. Here's how you do that

if s is missing categories from s3

new_s = s.astype(s3.dtype)
is_dtype_equal(new_s.dtype, s3.dtype) # this is True

if both Series are missing categories from the other we must make the union of the categories

s4 = pd.Series(["b","c"], dtype="category")

categories = set(s.dtype.categories + s4.dtype.categories) # make union of categories

new_s = s.astype("category", categories=categories)
new_s4 = s4.astype("category", categories=categories)

is_dtype_equal(new_s.dtype, new_s4.dtype) # this is True
Max Kanter
  • 2,006
  • 6
  • 16