I have a dataset that tracks lead generation (5 categories) to sales(binary). In between this process, there is an app involved with some analytics available - no. of sessions, duration of session, pages visited and buttons pressed (binary variable) My goal is to determine what combination of variables led to a successful sale. Here is a list of variables and type.
# | Column | Dtype |
---|---|---|
0 | id | int64 |
1 | sessions | int64 |
2 | duration | int64 |
3 | pages | int64 |
4 | Button 1 | object |
5 | Button 2 | object |
6 | Button 3 | object |
7 | Button 4 | object |
8 | Button 5 | object |
9 | Button 6 | object |
10 | Button 7 | object |
11 | otp | object |
12 | object | |
13 | meeting | object |
14 | channel | object |
15 | sales | object |
Will a Sankey diagram be the best way to represent this? If yes, then how? If not, then what should I try?