1

I have a two tables of of customer information and transaction info.

Customer information includes each person's quality of health (from 0 to 100)

e.g. if I extract just the Name and HealthQuality columns:

John: 70
Mary: 20
Paul: 40

etc etc.

After applying featuretools I noticed a new DIFF(HealthQuality) variable.

According to the docs, this is what DIFF does:

"Compute the difference between the value in a list and the previous value in that list."

Is featuretools calculating the difference between Mary and John's health quality in this instance?

I don't think this kind of feature synthesis really works for customer records e.g. CUM_SUM(emails_sent) for John. John's record is one row, and he has one value for the amount of emails we sent him.

For now I am using the ignore_variables=[all_customer_info] option to remove all of the customer data except for transactions table of course.

This also leads me into another question.

Using data from the transactions table, John now has a DIFF(MEAN(transactions.amount)). What is the DIFF measured in this instance?

   id  MEAN(transactions.amount)  DIFF(MEAN(transactions.amount))
0   1                  21.950000                              NaN
1   2                  20.000000                        -1.950000
2   3                  35.604581                        15.604581
3   4                        NaN                              NaN
4   5                  22.782682                              NaN
5   6                  35.616306                        12.833624
6   7                  24.560536                       -11.055771
7   8                 331.316552                       306.756016
8   9                  60.565852                      -270.750700
SCool
  • 3,104
  • 4
  • 21
  • 49

0 Answers0