New wave of referrer spam wrecking Google Analytics data
If you're seeing a lot of referral traffic in your Google Analytics data, you're not alone. Contributor Jonathan Hochman explains how referral spam happens, lays out methods to combat it and suggests potential solutions for Google's consideration.
A new surge of referrer spam is damaging Google Analytics data sets. These attacks have rendered the Traffic Referrals report useless for many Google Analytics properties. The problem can even be so significant for small business sites that it seriously distorts the number of sessions and page views.
In the following example, lines 1, 2, 5, 6, 7, 8 and 9 are all spam.
Why would attackers generate Google Analytics spam? Webmasters look at Google Analytics and frequently visit a site that appears in the data. Referrer spam can thus be used to generate traffic and sales leads, spread malware or conduct phishing attacks.
If you see a suspicious site in your referrer data, don’t visit it. Some attacks are just for the “lulz,” like this one:
How does referral spam happen? Some attackers run bots. Some use hijacked computers in botnets.
Google Analytics is an old product created when security was not a high priority. Tracking is done with a unique number for each property; a property can be a website, an app or some other digital artifact. Unfortunately, the tracking numbers are sequential, which makes them very easy to guess.
Google Analytics allows up to 50 properties in each account. Each property has a serial number that looks like UA-12345-1. The UA stands for “Urchin Analytics” which is was the product’s name until Google acquired Urchin in 2005. The middle digits are the account number. All properties in the same account have the same middle number.
The number after the second dash is the property number. These range from 1 to 50. The referrer spam attacks appear to mainly target property 1, and sometimes 2 and 3.
If your website has a high property number (e.g., UA-98765-11), referrer spam probably hasn’t affected it yet. Why not just create a new property with a higher number?
The problem is twofold. First, a new property will not have the historical data of the original property, making data analysis harder. Second, if enough people used this tactic, the spammers would probably start targeting the higher numbers.
Google Analytics provides a filtering option. Like the anti-spam filters we used to use with email, these require constant updating as the spammers evolve new tactics.
The Definitive Guide to Removing All Google Analytics Spam by Mike Sullivan provides an excellent recipe for stopping referrer spam, but the solution is complex. Sullivan offers to manage the solution for $75 per site per year. For a consultant or corporate marketing department responsible for hundreds of sites, that’s a non-trivial cost. For the entire Google Analytics user base, it’s a lot of money.
There is also an associated risk, because complex filters and .htaccess file rules inevitably have bugs and require thorough testing. One wrong filter can wipe out a large swath of traffic data, and there’s no way to recover it afterwards. An erroneous .htaccess setup could seriously impact visitors.
Filters only eliminate spam going forward; they do not remove past spam. To get a clean set of historical data, it’s necessary to create a custom segment that eliminates spam. Like filtering, this process adds complexity and needs ongoing updates.
Current solutions are far from ideal because of complexity, cost and risk. It is terribly wasteful for each of hundreds of thousands of webmasters to implement the same filters and customer segments to solve the same spam problem.
Instead, Google should offer a predefined set of filters to eliminate the vast majority of fake analytics data.
Google’s John Mu has said that the company is working on general solutions.
@Jehochman So no URLs with non-ascii unicode characters? I’m sure the awesome @googleanalytics folks are working on more general solutions.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) November 30, 2016
In the meantime, there are several things Google should consider:
- Provide a simple way to apply filters to past data, eliminating the need for custom segments to remove spam.
- Provide a simple way to download and upload a set of filters. For those who manage multiple Google Analytics accounts, this would save considerable time versus having to re-enter each filter definition in each view.
- Offer new UA tracking codes that don’t have predictable digits. Currently, UA account numbers are sequential, making it easy for spammers to brute force attack one account after another. Property numbers predictably run from 1 to 50. In contrast, credit card numbers are not sequential, which helps deter brute force attacks.
Referrer spam isn’t particularly lucrative compared to other forms of cyber-crime. But referrer spam is so easy that it has become a real nuisance.
Once Google implements a general solution, that should raise computing costs for the attackers, and hopefully, there will be a lot less Google Analytics spam.
Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.
New on MarTech