I'm using a hash of IP + User Agent as a unique identifier for every user that visits a website. This is a simple scheme with a pretty clear pitfall: identifier collisions. Multiple individuals browse the internet with the same IP + user agent combination. Unique users identified by the same hash will be recognized as a single user. I want to know how frequently this identifier error will be made.
To calculate the frequency, I've created a two-step funnel that should theoretically convert at zero percent: publish.click
> signup.complete
. (Users have to signup before they publish.) Running this funnel for 1 day gives me a conversion rate of 0.37%. That figure is, I figured, my unique identifier collision probability for that funnel. Looking at the raw data (a table about 10,000 rows long), I confirmed this hypothesis. 37 signups were completed by new users identified by the same hash as old users who completed publish.click
during the funnel period (1 day). (I know this because hashes matched up across the funnel, while UIDs, which are assigned at signup, did not.)
I thought I had it all figured out...
But then I ran the funnel for 1 week, and the conversion rate increased to 0.78%. For 5 months, the conversion rate jumped to 1.71%.
What could be at play here? Why is my conversion (collision) rate increasing with widening experiment period?
I think it may have something to do with the fact that unique users typically only fire signup.complete
once, while they may fire publish.click
multiple times over the course of a period. I'm struggling however to put this hypothesis into words.
Any help would be appreciated.