-1

I'm trying to train a learning model on real estate sale data that includes dates. I've looked into 1-to-K binary encoding, per the advice in this thread, however my initial assessment is that it may have the weakness of not being able to train well on data that is not predictably cyclic. While real estate value crashes are recurring, I'm concerned (maybe wrongfully so, you tell me) that doing 1-to-K encoding will inadvertently overtrain on potentially irrelevant features if the recurrence is not explainable by a combination of year-month-day.

That said, I think there is potentially value in that method. I think that there is also merit to the argument of converting time series data to ordinal, as also recommended in the same thread. Which brings me to the real question: is it bad practice to duplicate the same initial feature (the date data) in two different forms in the same training data? I'm concerned if I use methods that rely on the assumption of feature independence I may be violating this by doing so.

If so, what are suggestions for how to best get the maximal information from this date data?

Edit: Please leave a comment how I can improve this question instead of down-voting.

Brendan
  • 1,905
  • 2
  • 19
  • 25

1 Answers1

1

Is it bad practice?

No, sometimes transformations make your Feature easier accesible for your algorithm. Following this line of thought you converting Features is completely fine.

Does it scew your algorithm?

Concerning runtime it might be better to not have to transform your data everytime. Depending on your algorithm you might get worse interpretability (if that is important for you) depending on the type of transformations. Also if you want to restrict the amount / set of Features your algorithm should use, you might add Information redundancies by adding transformed Features.

So what should you do?

Transform your data / Features as much as you want and as often as you want. That's not hurting anyone, but rather helping by increasing the Feature space. But after you did so, do a PCA or something similar in order to find redundancies in your Features and reduce your Feature space again.

Note:

I tried to be General, obviously this is highly dependant on the Kind of algorithm you're using.

mrk
  • 8,059
  • 3
  • 56
  • 78