As of version v0.12.0, FeatureTools allows you to assign custom names to multi-output primitives: https://github.com/alteryx/featuretools/pull/794. By default, the when you define custom multi-output primitives, the column names for the generated features are appended with a [0]
, [1]
, [2]
, etc. So let us say that I have the following code to output a multi-output primitive:
def sine_and_cosine_datestamp(column):
"""
Returns the Sin and Cos of the hour of datestamp
"""
sine_hour = np.sin(column.dt.hour)
cosine_hour = np.cos(column.dt.hour)
ret = [sine_hour, cosine_hour]
return ret
Sine_Cosine_Datestamp = make_trans_primitive(function = sine_and_cosine_datestamp,
input_types = [vtypes.Datetime],
return_type = vtypes.Numeric,
number_output_features = 2)
In the dataframe generated from DFS, the names of the two generated columns will be SINE_AND_COSINE_DATESTAMP(datestamp)[0]
and SINE_AND_COSINE_DATESTAMP(datestamp)[1]
. In actuality, I would have liked the names of the columns to reflect the operations being taken on the column. So I would have liked the column names to be something like SINE_AND_COSINE_DATESTAMP(datestamp)[sine]
and SINE_AND_COSINE_DATESTAMP(datestamp)[cosine]
. Apparently you have to use the generate_names
method in order to do so. I could not find anything online to help me use this method and I kept running into errors. For example, when I tried the following code:
def sine_and_cosine_datestamp(column, string = ['sine, cosine']):
"""
Returns the Sin and Cos of the hour of the datestamp
"""
sine_hour = np.sin(column.dt.hour)
cosine_hour = np.cos(column.dt.hour)
ret = [sine_hour, cosine_hour]
return ret
def sine_and_cosine_generate_names(self, base_feature_names):
return u'STRING_COUNT(%s, "%s")' % (base_feature_names[0], self.kwargs['string'])
Sine_Cosine_Datestamp = make_trans_primitive(function = sine_and_cosine_datestamp,
input_types = [vtypes.Datetime],
return_type = vtypes.Numeric,
number_output_features = 2,
description = "For each value in the base feature"
"outputs the sine and cosine of the hour, day, and month.",
cls_attributes = {'generate_names': sine_and_cosine_generate_names})
I had gotten an assertion error. What's even more perplexing to me is that when I went into the transform_primitve_base.py
file found in the featuretools/primitives/base
folder, I saw that the generate_names
function looks like this:
def generate_names(self, base_feature_names):
n = self.number_output_features
base_name = self.generate_name(base_feature_names)
return [base_name + "[%s]" % i for i in range(n)]
In the function above, it looks like there is no way that you can generate custom primitive names since it uses the base_feature_names
and the number of output features by default. Any help would be appreciated.