sorry for the late reply, maybe this will be helpful for someone else.
The first thing you'll need to do is get your data into shape, specifically a long shape. I've built a sample below for the first 3 Customer IDs in your output table:
|Customer ID|Channel |Conversion |
|----------|------------|--------------|
|1 |TV |Conversion |
|1 |TV |Conversion |
|1 |Call Centre |Conversion |
|2 |TV |Conversion |
|2 |TV |Conversion |
|3 |Search |Conversion |
|3 |Search |Non-conversion|
|3 |Call Centre |Non-conversion|
Notice that if look at the most popular channel for each Customer ID, that it will correspond to the 'Channel Atribution by curtomer' field in your output able?
You can do this by:
- Grouping the Customer IDs and the Conversion (Conversion-Customer ID relationship should be 1 to 1)
- Count the occasions the channel occurs for each Customer, this will give you
Customer ID |
Channel |
Conversion |
Channel Count |
1 |
TV |
Conversion |
2 |
1 |
TV |
Conversion |
2 |
1 |
Call Centre |
Conversion |
1 |
2 |
TV |
Conversion |
2 |
2 |
TV |
Conversion |
2 |
3 |
Search |
Non-conversion |
2 |
3 |
Search |
Non-conversion |
2 |
3 |
Call Centre |
Non-conversion |
1 |
There is some duplication on the conversion and count fields, ignore for now.
- With the same grouping as above, filter on the max Channel Count column. This will give you:
Customer ID |
Channel |
Conversion |
Channel Count |
1 |
TV |
Conversion |
2 |
1 |
TV |
Conversion |
2 |
2 |
TV |
Conversion |
2 |
2 |
TV |
Conversion |
2 |
3 |
Search |
Non-conversion |
2 |
3 |
Search |
Non-conversion |
2 |
- Run a distinct across the dataset, this will give you:
Customer ID |
Channel |
Conversion |
Channel Count |
1 |
TV |
Conversion |
2 |
2 |
TV |
Conversion |
2 |
3 |
Search |
Non-conversion |
2 |
Which corresponds to your imagined output.
Tie breakers
It will happen that a Customer ID will have two or more equal number of channels. For instance two TV and two Search. How do we manage this? If you really must have one row per customer, then depending on what you're planning on doing, you'll need to either:
Build some priority ranking logic where rules dictate which channel is counted as the attribution.
Build some logic that randomly attributes the channel in the case of tie breakers.
I hope that helps, I've kept the answer code free but had R/Python in the back of my mind. It could possible be implemented in Excel, but someone far smarter than I would need to contribute that answer.