2

The transform primitive works fine with additional arguments. Here is an example

def string_count(column, string=None):
    '''
    ..note:: this is a naive implementation used for clarity
    '''
    assert string is not None, "string to count needs to be defined"
    counts = [str(element).lower().count(string) for element in column]
    return counts


def string_count_generate_name(self):
    return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(),
                                      '"' + str(self.kwargs['string'] + '"'))


StringCount = make_trans_primitive(
    function=string_count,
    input_types=[Categorical],
    return_type=Numeric,
    cls_attributes={
        "generate_name": string_count_generate_name
    })

es = ft.demo.load_mock_customer(return_entityset=True)
count_the_feat = StringCount(es['transactions']['product_id'], string="5")
fm, fd = ft.dfs(
    entityset=es,
    target_entity='transactions',
    max_depth=1,
    features_only=False,
    seed_features=[count_the_feat])

Output:

                product_id  STRING_COUNT(product_id, "5")
transaction_id                                           
1                        5                              1
2                        4                              0
3                        3                              0
4                        3                              0
5                        4                              0

However, if I modify and make into Aggregation Primitive like so:

def string_count(column, string=None):
    '''
    ..note:: this is a naive implementation used for clarity
    '''
    assert string is not None, "string to count needs to be defined"
    counts = [str(element).lower().count(string) for element in column]
    return sum(counts)


def string_count_generate_name(self):
    return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(),
                                      '"' + str(self.kwargs['string'] + '"'))


StringCount = make_agg_primitive(
    function=string_count,
    input_types=[Categorical],
    return_type=Numeric,
    cls_attributes={
        "generate_name": string_count_generate_name
    })

es = ft.demo.load_mock_customer(return_entityset=True)
count_the_feat = StringCount(es['transactions']['product_id'], string="5")

I get the following error:

TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'

Are custom Aggregation Primitives With Additional Arguments supported in featuretools?

Jeff Hernandez
  • 2,063
  • 16
  • 20

1 Answers1

1

The issue here is a missing argument to your seed feature. For an aggregation primitive, you need to specify the entity on which to aggregate. In this case, changing the construction of your aggregation seed feature to

count_the_feat = StringCount(es['transactions']['product_id'], es['sessions'], string="5")

will create the feature

sessions.STRING_COUNT(product_id, "5")

as expected. The feature will give how often the string “5” shows up for each session id.

Seth Rothschild
  • 384
  • 1
  • 14
  • What about if I wanted to also input a different feature into the agg primitive? Would it look something like this: `count_the_feat = StringCount(es['transactions']['product_id'], es['sessions'], es['transactions']['amount'], es['sessions'], string="5")` – Jeff Hernandez Jun 04 '18 at 14:41
  • I get the following error: `ValueError: Unable to parse timedelta: ` – Jeff Hernandez Jun 04 '18 at 14:54
  • 1
    I can reproduce that error trying to make an agg primitive with multiple inputs. In the meantime, you can specify each aggregation separately and then put them together afterwards (or, combine columns first and then aggregate). An example would be something like `double_agg = StringCount(es['transactions']['product_id'], es['sessions'], string="5") * Mean(es['transactions']['amount'], es['sessions'])` – Seth Rothschild Jun 04 '18 at 15:39
  • How do I approach a aggregation which is dependent on multiple inputs that can't be aggregated individually? – Jeff Hernandez Jun 04 '18 at 15:44
  • 1
    @Jeff I think that's a really interesting question, but I'm having trouble coming up with an example that can't be handled by composing agg and trans primitives. The syntax `StringCount(es['transactions']['product_id'], es['transactions']['amount'], es['sessions'], string="5")` implies to me that you would "do `f(product_id`, `amount`), then aggregate by `sessions`". The goal would then by to find an `f` that can't be handled by a transform primitive. – Seth Rothschild Jun 04 '18 at 16:36
  • 1
    Here is an example of the problem statement. How would you compose agg and/or trans primitives to find the correlation (e.g. np.correlate ) between the transaction amounts on product ids 1 and 3. Thank you for your help @Seth – Jeff Hernandez Jun 04 '18 at 17:02
  • 2
    Ah, backing up a step: it seems like the error is being caused by not passing in the input variables as a list. This is a slight difference between `make_agg_primitive` and `make_trans_primitive`. You should be good after adding the extra bracket -- `StringCount([es['transactions']['product_id'], es['transactions']['amount']], es['sessions'], string="5")` – Seth Rothschild Jun 05 '18 at 14:25