How to organize data for Mutllevel modeling - Decision Tree, Classification, or Regression

Question

I have three tables - Sales Manager, Customer, and Order. Each sales manager has multiple customers, and each customer can have multiple orders.

I am interested in determining if certain attributes of sales manager and attributes of customer will lead to sales of a particular product (Let's say Product A Yes/no).

Suppose I have 3 sales managers, 10 customers, and 20 orders.

Should I structure the data set to have 3 rows, 10 rows or 20 rows. Please advise.

Also, will the decision tree, and classification algorithm automatically understand the hierarchical relationships among manager, customer and order?

Thanks.

There are a number of platform specific concerns that arise. If you could specify which statistical application you are doing the analysis in? — Brandon Bertelsen, Apr 17 '11 at 22:32

score 0 · Answer 1 · answered Apr 17 '11 at 22:30

I think you should make one big feature matrix out of it. Suppose you have tables

Sales Manager (id attr_1 ... attr_m)
Customer (id attr_1 ... attr_n sales_manager_id)
Order (id product_id_1 ... product_id_l customer_id)

Then it is most probably reasonable to create the matrix in the following form

Matrix:
product_id order_attr_1 ... order_attr_l customer_attr_1 ... customer_attr_n ... manager_attr_1 ... manager_attr_m

Now you have 20*l row matrix with all the attributes that are given for certain order.

In the simplest form you can use the following matrix for classification. In case of too many attributes maybe it is reasonable to use PCA first. Maybe you should try to use Weka and see, what turns out.

Considering your question about the hierarchical relations, then the classification algorithms will not understand them explicitly.
I would recommend this book here: Introduction to Data Mining, as it answers most of your questions.

How to organize data for Mutllevel modeling - Decision Tree, Classification, or Regression

1 Answers1