1

I have large traffic files that I'm trying to analyze in order to get statistical features of users.
One of the features that I would like to extract is links clicking in specific sites (for examples - clicking on popups and more)

My first idea was to look in the packets' content and search for hrefs and links, save them all in some kind of data structure with their time stamps, and then iterate again over the packets to search for requests at any time close to the time the links appeared.

Something like in the following pseudo code (in the following code, the packets are sorted by flows (flow: IP1 <=> IP2)):

for each packet in each flow:
      search for "href" or "http://" or "https://"
      save the links with their timestamp
for each packet in each flow:
      if it's an HTTP request and its URL matches any URL in the list and the 
         time is close enough, record it

The problem with this code is that some links are dynamically generated while the page is loading (using javascript or so), and cannot be found using the above method.

I have also tried to check the referrer field in the HTTP header and look for packets that were referred by the relevant sites. This method generates a lot of false positives because of iframes and embedded objects.

It is important to mention that this is not my server, and my intention is to make a tool for statistical analysis of users behavior (thus, I can't add some kind of click tracker to my site).

Does anyone have an idea what can I do in order to check if the users clicked on links according to their network traffic?
Any help will be appreciated!
Thank you

kobibo
  • 393
  • 1
  • 2
  • 11

0 Answers0