Suggestions for feature engineering

Question

I am having a problem during feature engineering. Looking for some suggestions. Problem statement: I have usage data of multiple customers for 3 days. Some have just 1 day usage some 2 and some 3. Data is related to number of emails sent / contacts added on each day etc.

I am converting this time series data to column-wise ie., number of emails sent by a customer on day1 as one feature, number of emails sent by a customer on day2 as one feature and so on. But problem is that, the usage can be of either increasing order or decreasing order for different customers.

ie., example 1: customer 'A' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=0

example 2: customer 'B' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=100

example 3: customer 'C' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=0

example 4: customer 'D' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=100

In the first two cases => My new feature will have "-100" and "100" as values. Which I guess is good for differentiating. But the problem arises for 3rd and 4th columns when the new feature value will be "0" in both scenarios Can anyone suggest a way to handle this

Instead of printing `0`, print "No Change" or something similar when that's the case. — martineau, Apr 11 '19 at 00:40
I thought of it , but I am confused about one thing. If I do that , I will have to make the new feature as categorical , which is not ideal as the other values will be continous. Instead I can have absolute values in the new feature and indicate the trend as "+1" or increasing "-1" for decreasing "no change" for no change and "0" if both the values have been "0". Would that be a good approach though? — SSuram, Apr 11 '19 at 00:51
It's hard to say because you haven't precisely defined what the criteria / constraints are for judging whether a given way to handle the situation is "good" one or not. — martineau, Apr 11 '19 at 01:00
I would want to capture the usage trend for 3 days of each of these customers for all the useful features. And based on the trend I have to classify customers into different classes. Does that answer? — SSuram, Apr 11 '19 at 01:05
You can take the sin(#emails_in_a_day/#max_number_of_emails). Or, you can take a mean of all days and update each day to the #of_days_more_or_less_than_mean. — rhn89, Apr 12 '19 at 20:39

score 1 · Answer 1 · answered Nov 12 '19 at 21:22

1

You can extract the following features:

Simple Moving Averages for day 2 and day 3 respectively. This means you now have two extra columns.
Percentage Change from previous day
Percentage Change from day 1 to 3

answered Nov 12 '19 at 21:22

Pascal Zoleko

691
6
8

Suggestions for feature engineering

1 Answers1