1

in the last days I have been working with markov chain for a multi touch (data driven) attribution model, I have found too much important information at the macro level, for example, the ChannelAttribution package gives me the attribution of each one of the channels of the process to achieve a conversion (either TV, search or call-center) but this attribute is done taking into account all customer journeys, and also the elimination effects for each channel. My question is the following, at a micro level of the analysis, can I obtain at the customer level, which was the channel that most attributed to their purchase decision? That is, which is the channel that had the greatest impact for each one of them customers to make their purchase? it does not matter if a conversion was not made or not.

For example, I imagine an output like the following:

Curtomer ID Channel Atribution by curtomer Conversion
1 TV Conversion
2 TV Conversion
3 Search Non-Conversion
4 Call-center Conversion
5 TV Non-Conversion
6 Call-center Conversion

I would be grateful, also sorry for my English I hope to be clear.

  • Michael your question is clear, could you provide an example of the input data? Then we can help you get to the output data. – Michael Plazzer Nov 22 '21 at 23:43
  • Sure @Michael Plazzer, the imput data is basically by user with two columns; first, column path have all journey of the user, is a string column, example **TV > TV > Call-Center > Search > Social**, Second column is Conversion, it has ones If the user made the conversion, zeros if not. – Michael Geinner Melgarejo Flor Dec 01 '21 at 17:10

1 Answers1

1

sorry for the late reply, maybe this will be helpful for someone else.

The first thing you'll need to do is get your data into shape, specifically a long shape. I've built a sample below for the first 3 Customer IDs in your output table:

|Customer ID|Channel |Conversion | |----------|------------|--------------| |1 |TV |Conversion | |1 |TV |Conversion | |1 |Call Centre |Conversion | |2 |TV |Conversion | |2 |TV |Conversion | |3 |Search |Conversion | |3 |Search |Non-conversion| |3 |Call Centre |Non-conversion| Notice that if look at the most popular channel for each Customer ID, that it will correspond to the 'Channel Atribution by curtomer' field in your output able?

You can do this by:

  1. Grouping the Customer IDs and the Conversion (Conversion-Customer ID relationship should be 1 to 1)
  2. Count the occasions the channel occurs for each Customer, this will give you
Customer ID Channel Conversion Channel Count
1 TV Conversion 2
1 TV Conversion 2
1 Call Centre Conversion 1
2 TV Conversion 2
2 TV Conversion 2
3 Search Non-conversion 2
3 Search Non-conversion 2
3 Call Centre Non-conversion 1

There is some duplication on the conversion and count fields, ignore for now.

  1. With the same grouping as above, filter on the max Channel Count column. This will give you:
Customer ID Channel Conversion Channel Count
1 TV Conversion 2
1 TV Conversion 2
2 TV Conversion 2
2 TV Conversion 2
3 Search Non-conversion 2
3 Search Non-conversion 2
  1. Run a distinct across the dataset, this will give you:
Customer ID Channel Conversion Channel Count
1 TV Conversion 2
2 TV Conversion 2
3 Search Non-conversion 2

Which corresponds to your imagined output.

Tie breakers

It will happen that a Customer ID will have two or more equal number of channels. For instance two TV and two Search. How do we manage this? If you really must have one row per customer, then depending on what you're planning on doing, you'll need to either:

  1. Build some priority ranking logic where rules dictate which channel is counted as the attribution.

  2. Build some logic that randomly attributes the channel in the case of tie breakers.

I hope that helps, I've kept the answer code free but had R/Python in the back of my mind. It could possible be implemented in Excel, but someone far smarter than I would need to contribute that answer.

Michael Plazzer
  • 447
  • 1
  • 6
  • 18