2

I have successfully managed to filter out hits from certain spammy sites from Google Analytics. It's an ongoing battle, as new sites are popping up all the time and polluting my acquisition/referral results.

At present, the following match is used by the GA filter to stop all the sites below showing up in the data:

.*(best\-seo\-solution|semalt|buttons\-for\-website|social\-buttons|best\-seo\-offer|Get\-Free\-Traffic\-Now|buttons\-for\-your\-website|free\-share\-buttons)\.com.*

I've added most of these myself and it works however I now need to create a pattern that allows me to input URLs that aren't a standard something.com pattern. E.g:

site4.free-share-buttons.com
site5.free-share-buttons.com

So in these cases the end is always the same but the start can be variable.

buy-cheap-online.info

In this case it ends with .info

www.event-tracking.com

This one uses www. whereas others do not

http://webmaster-traffic.com

This one has the http:// as well.

And on top of all of that, the filter pattern can only be 255 maximum characters (but I can have more than one filter pattern) so I need to segment it up.

How can I create a regex filter pattern that would target all above URLs?

Francesca
  • 26,842
  • 28
  • 90
  • 153

1 Answers1

4

Google Analytics allows to create regex without having to escape all especial characters when the expression is simple. So you can write the expression without the backslashes \ and .* You can even remove the .com and the parenthesis since these names are very specific already

best-seo-solution|semalt|buttons-for-website|social-buttons|best-seo-offer|Get-Free-Traffic-Now|buttons-for-your-website|free-share-buttons|event-tracking|buy-cheap.info

If you happen to have a spam with a common name just add the full name |commonname.net for this specific case.

You can keep going until you reach 255 characters after that just add a second filter. This will work, but it has 3 downsides,

  • first there is 1 or 2 new spammers every week

  • second by the time you add it you already have some hits

  • third and this is a new behavior, some spam in now hitting with direct visits along with the referral and this won't be stopped by this filter.

To prevent this, I recommend you to use a valid hostname filter instead, this filter will only allow hits with one of your hostnames, and all ghost spam will be excluded since they use either a fake hostname or is not set.

Here you can find more information about referrer spam and the valid hostname filter

https://stackoverflow.com/a/28354319/3197362

http://www.ohow.co/things-you-must-know-about-spam-in-google-analytics/

Community
  • 1
  • 1
Carlos Escalera Alonso
  • 2,333
  • 2
  • 25
  • 37
  • Thanks for this. I have already added .htaccess to block some referrer spam but I heard that often nowadays spambots are not even hitting your site, so they won't ever be filtered out by the .htaccess. I was just looking for a solution to block as much as I can (both through .htaccess and through filters) – Francesca May 12 '15 at 14:30
  • Yes you are right, in fact, most of the latest spam **can't be blocked using the .htaccess** . If you follow the 2nd link there is more info about this. – Carlos Escalera Alonso May 12 '15 at 14:50