Why are the new features created using the transformative primitives like WEEKDAY, DayOfMonth, YEAR, MonthOfYear type features created as integer i.e., continuous features? Are they not supposed to be categorical features? i mean when creating these features isn't the dtype of these columns supposed to be 'object' and not 'int' ?
Asked
Active
Viewed 123 times
1 Answers
1
Categorical or ordinal features are best stored as integer values. This is because it more efficient to represent data as an integer than as a string. For example, [1, 4, 3, 1] requires a lot less memory than ["January", "April", "March", "January"]. You can determine the data type of feature using the list of feature definitions that is returned by ft.dfs
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=[],
trans_primitives=["month"])
feature_defs
is a list of feature definitions
[<Feature: zip_code>, <Feature: MONTH(join_date)>]
we can get the variable type like this
feature_defs[1].variable_type
this returns
featuretools.variable_types.variable.Ordinal
For encoding discrete features into numeric features for machine learning look at the documentation for ft.encode_features
.

Max Kanter
- 2,006
- 6
- 16