5 Ways To Evaluate The Quality Of Audience Data
The fundamental evolution in display advertising, ushered in by real-time bidding (RTB) technology, is the concept of buying audiences as opposed to just inventory. We used to buy “audiences” by guessing about the characteristics of the visitors to a particular site or type of site. Now, one can pick and choose individuals by their characteristics […]
The fundamental evolution in display advertising, ushered in by real-time bidding (RTB) technology, is the concept of buying audiences as opposed to just inventory. We used to buy “audiences” by guessing about the characteristics of the visitors to a particular site or type of site. Now, one can pick and choose individuals by their characteristics regardless of where they happen to be on the web.
Buying audiences requires that your ad platform have access to targeting data while making split-second decisions about which impressions to buy. Armed with audience data, you can target ad viewers at a demographic, psychographic, and behavioral level, resulting in highly targeted ad campaigns that are also more relevant to the target audience.
We previously looked at how to define premium inventory, which dealt with how advertisers and publishers value context. With the advent of RTB and the ability to target audiences directly, no matter the context, it’s important to look deeper into the fuel that makes it all possible: audience data.
And a critical point to understand: not all data are created equal.
My aim is to provide a simple framework for evaluating the quality of any particular source of audience data. To do so requires looking at the data from a variety of perspectives: Where did it come from? What mechanism does it rely on? How was it collected? How fresh is it? How much does it cost? And how does it perform?
To help illustrate this framework, I’ve included the diagram below, which can also serve as a rubric for any evaluation.
(Download the print-ready PDF here)
As you can see, there are five main criteria by which you can judge the quality of audience data. On one end of the spectrum, you have data which are opaque, fragile and of questionable value. On the other end of the spectrum, data which are transparent, robust and of the highest value to both marketers and publishers. By diving into the nuts and bolts, I hope you will see why this is the case.
1. Source: Where Did The Data Originate?
Knowing where audience data comes from is the first step in evaluating quality. In other words, who owns the data? Even though the answer is usually obvious, knowing this is important because it affects the level of visibility or transparency that you will have into the data itself. There are three ways of looking at data ownership:
- 1st Party — This means it’s your data. If you are an advertiser, this refers to the data you collect about visitors from your own Web properties and customers from your CRM systems. If you are a publisher, this refers to the data you collect first-hand from your website visitors or users. First-party data is the highest quality. Best of all, because it’s your data, it doesn’t cost you anything to use it. This means from a value perspective, it’s hard to beat.
- 2nd Party — The next best thing to first-party data is second-party data. This is data you use that belongs to someone else with whom you have a relationship. For example, if you are an advertiser and you strike a deal with a publisher, you would be leveraging second-party data. This is close to first-party data in terms of quality, but that all depends on the level of transparency you are given into its attributes.
- 3rd Party — This is data you are using that is usually from an unknown party or origin. In some cases, the origin is described, but hardly easy to verify. Third-party data generally comes from DMPs (data management platforms) that aggregate and normalize audience data from a multitude of data providers, publishers, and other sources. Due to the opacity of most third-party data, it’s prudent not to automatically assume that it is of high quality or from a trusted source. Keep in mind that the freshness (or data age) is often unknown, and most third-party data providers rely on browser cookies as the underlying mechanism.
2. Mechanism: How Is It Powered?
Another crucial dimension to consider when evaluating targeting data is how it’s powered, or what underlying technology it depends on. Does it rely on browser cookies or fingerprinting technology? Or is it attached to a user ID on a closed application or platform? This is important to understand because it directly affects the reliability of tracking and reaching the target audience. Let’s dive into each one now and learn more.
- Native Database — Native platform data is first-party data associated with user accounts on proprietary applications and platforms. Examples include Facebook, LinkedIn, Twitter, Google, Amazon, Microsoft and so on. This data often powers the ad platforms offered by these companies. Native data is the highest quality, being that it’s first-party, declared by the user (generally through profile or registration forms), and highly robust given that it doesn’t rely on cookies, but rather on internal databases.
- Fingerprinting — Fingerprinting is a relatively new technology that is being positioned as a replacement for cookies. Fingerprinting works by collecting all the unique attributes about a computer, such as: IP address, screen resolution, browser version, browser plugins, font library, time zone and much more. All of these details comprise a digital “fingerprint,” which, according to the Electronic Frontier Foundation (EFF), is said to be unique in about 94% of cases. It’s a very promising technology that appears to be far more robust than cookies, but the future of fingerprinting still remains to be seen. (To learn more, check out the free Panopticlick tool by the EFF.)
- Cookies — For the longest time, browser cookies have been the standard mechanism for powering online display advertising, particularly the behavioral targeting component, and especially in the DMP and RTB space. However, their effectiveness is slowly crumbling (using this pun is unavoidable), with a multitude of threats on the horizon: privacy advocates, government regulation, browser companies, software giants and more. As a result, many players in the ad tech ecosystem are actively preparing for a world without cookies. Hence, the rise of fingerprinting technologies. Also keep in mind that anywhere between 50-75% of cookies decay (or disappear) within 30 days, which is another reason why cookies are a fragile targeting mechanism.
3. Methodology: How Was It Gathered?
Another important criteria to consider when evaluating target data is the data collection methodology, or how it is gathered. This is important to know because it directly affects the accuracy of the data in question (or lack thereof). There are generally three ways of gathering data:
- Declared — Declared data is explicitly volunteered or disclosed by visitors, customers or users, whether from their profile information, registration data or survey answers. Such data can include age, gender, household income, interests, language, religion and much more. Declared data is of the highest quality because of its source. There is no guesswork involved. It comes straight from the horse’s mouth. (Though there’s no guarantee people are being truthful.)
- Inferred — Inferred data collection is when data is collected and labeled in a way that makes an educated guess about some characteristic or attribute of an audience. Oftentimes, it’s based on browsing behavior. For example, a publisher that has a website for new mothers could create a number of inferred data segments based on their website visitors: females, new parents, new mothers, age range and so on. Naturally, this isn’t the most accurate way to categorize all visitors, but it’s inferred based on the subject matter of the website. For this reason, it’s not as strong as declared data; but, it’s usually pretty accurate. Another form of inferred data is used with retargeting. If you’ve visited my website before, I can infer that you are in the market for my product or service, and are therefore more valuable to me.
- Modeled — In order to increase the size and scale of particular audience segments, many companies have invented methods of modeling the characteristics of existing audiences from inferred or declared data. The attempt is to find more of the same type of people. This is often referred to as “look-alike” or “behave-alike” data, which is still a form of inference, just a further leap. And since modeling algorithms are often proprietary and work in a non-transparent ways, I place them on the lower end of the quality spectrum.
4. Freshness: How Old Is It?
One of the most overlooked properties of audience data is the age or freshness of the information. When looking at an audience, especially when cookies power it, it’s very important to understand how recently the information was acquired. Why? Because it drastically impacts the value of the data and informs the bidding strategy for any campaign in which it’s used.
A few years back, a case study conducted by Dapper (since acquired by Yahoo) found that a retargeting campaign, from a performance perspective, had the majority of conversions occur within the first 7 days. As a result, the study concluded it was ideal to bid significantly higher during this window of time.
This study demonstrated that the value of an audience, particularly in a retargeting context, is highest within the first 7 days. This makes a huge impact on both bidding strategy (bidding higher during that 7-day window and less afterwards), and more importantly, valuing the quality of the data. Once again, with first-party data (which typically powers retargeting campaigns), you have the most insight into the age of an audience, and therefore have the most insight into its value.
(It’s also important to note that freshness is less important in some circumstances, such as with first-party declared data powered by a native database — like Facebook. Your birth date and gender, among other things, don’t often change over time.)
5. Price: How Much Does It Cost?
From a practical perspective, one of the most important factors to consider when evaluating audience data is the price of the segment in question. Data prices are important because they have a direct impact on the actual performance of campaigns. These costs generally apply only to data that is not your own, namely second- and third-party data. With your own first-party data, costs are usually nil, offering you that much more value.
If there are specific performance goals, for example, the cost of data must be weighed against the incremental performance increases, if any. Data costs also vary wildly, going for as little as $0.25 CPM, all the way to $5.00 CPM and beyond, depending on the source. In general, though, the price of data is pretty arbitrary. As such, data prices can easily outweigh the cost of media, so it’s important to factor it into the final equation and ensure that it makes business sense.
Measuring Actual Performance
The previous criteria are simply ways to evaluate one of your campaign inputs; but ultimately, it’s the outcome of the data usage that matters. In other words, the real-world performance of the audience data you use will ultimately tell you whether or not it was valuable to your campaign. Everyone’s mileage will vary with the same data, no matter how good it looks on paper beforehand, so keep that in mind.
Judging the performance of data is pretty standard: did it produce positive or negative results? Results can be measured in revenue, conversions, brand lift and various other metrics. It always helps to have a baseline for comparison. For example: compared to non-data-driven campaigns, was there any incremental performance? If so, did it out-weigh the cost of the audience data?
If you could determine the performance of audience data before purchasing it, you wouldn’t need to worry about critical evaluations. Unfortunately, the only way of truly evaluating audience data is by actually applying it and measuring the results first hand.
Audience buying has revolutionized display advertising. It has brought a new degree of power, similar to the kind one finds in search, to display advertising. And this revolution is powered by audience data; so, it’s crucial that we understand how to evaluate this vital element. The programmatic space is rapidly evolving and the landscape is shifting under our feet. Expect much to change in the coming years, especially with regard to the mechanisms, pricing and sources of audience data.
Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.
New on MarTech