0

Why are the new features created using the transformative primitives like WEEKDAY, DayOfMonth, YEAR, MonthOfYear type features created as integer i.e., continuous features? Are they not supposed to be categorical features? i mean when creating these features isn't the dtype of these columns supposed to be 'object' and not 'int' ?

Harish Rajula
  • 699
  • 6
  • 11

1 Answers1

1

Categorical or ordinal features are best stored as integer values. This is because it more efficient to represent data as an integer than as a string. For example, [1, 4, 3, 1] requires a lot less memory than ["January", "April", "March", "January"]. You can determine the data type of feature using the list of feature definitions that is returned by ft.dfs

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="customers",
                                      agg_primitives=[],
                                      trans_primitives=["month"])

feature_defs is a list of feature definitions

[<Feature: zip_code>, <Feature: MONTH(join_date)>]

we can get the variable type like this

feature_defs[1].variable_type

this returns

featuretools.variable_types.variable.Ordinal

For encoding discrete features into numeric features for machine learning look at the documentation for ft.encode_features.

Max Kanter
  • 2,006
  • 6
  • 16