Synthetic data: More than just make-believe
Synthetic data may not be real data, but there may be some important, real world, digital marketing use cases for it.
Digital marketers work with real data all the time. What the online shopper does tells you a lot about what they want. But as we know, you need to be very careful about personally identifiable information (PII).
You can anonymize online shoppers by taking their names off their records, before analyzing the data. Or you can use an algorithm to synthesize observed online behavior, and use that “synthetic data” for your analysis.
That may seem like overkill. Why go through this effort when you have real data at your fingertips? Synthetic data will not be a replacement for real data, but it does have some specific use cases that a digital marketer may find useful.
Synthetic privacy in the real world
Reality is messy — in a good way
Still, there are advantages to working with the real thing.” With real world data, an analyst can “tease out the nuances and hidden patterns not revealed by other techniques,” said Steven Ramirez, CEO of Beyond the Arc, a San Francisco Bay Area firm specializing in CX, strategic communications and data science. Using an algorithm to synthesize the same data, however, “can introduce a fatal flaw” in identifying those patterns of activity, he said.
Predictive modeling relies on multiple data sources, as well as groups of models, Ramirez said. “There is an opportunity to use synthetic data to extend data sets and provide more data where it is sparse.” It is up to the analyst to understand the integrity of each data source.
“[S]ynthetic data will never be as accurate as real data,” Pondel said. “Even if generated based on real patterns, synthetic data always misses the essential ‘reality factor’, which only makes it useful in a limited number of business cases.”
“You magnify problems getting further away from source data,” Dilmegani said. Most algorithms will replicate the distribution in the source data. “Mistakes are replicated in the synthetic data as well.”
Mind the synthetic gap
Machine learning is very data hungry, Dilmegani pointed out. Some need may emerge for data marketers to purchase synthetic data in order to have enough data train an AI application. “This will drive the demand for synthetic data.” Dilmegani said.
For example, one application for synthetic data might be to train the AI that will operate a self-driving car. Synthetic data has also been used for the deep-learning applications needed for image processing, Dilmegani noted, a technique that has been around for almost a decade.
“I am skeptical about the uses of synthetic data.” Ramirez countered. “If you are building a machine learning/artificial intelligence model, it is not a good fit.” This goes to the heart of machine learning as it relates to artificial intelligence. About 60 to 80% of the work building an AI model is spent acquiring and preparing the data, Ramirez explained. Indded, this process “is the work.”
“The approach is to apply an algorithm or process to be able to create new data points,” Ramirez continued. “Synthetic data is produced by a process that is also subject to bias. Usually, we think of data as the ultimate source of truth…Often, we talk about letting the data speak,” Ramirez said. If the data is manufactured, then what is it saying?
“The smart application of synthetic data in training AI models can also exclude any bias that could be generated from AI models trained on real data.” Pondel said. “Regarding accuracy, in my opinion, synthetic data can be comparable to real data in a few cases.”
Applying synthetic data to digital marketing is going to be an evolution, not a revolution. Applications will be narrow and need-driven. It will become another tool in the toolbox. “At the moment, I recognize simulations and model testing/verification as the most promising area of synthetic data applications.” Pondel said.
Machine learning is data intensive, so the demand for data may drive the use of synthetic data, added Dilmegani. Like many things in machine learning and AI, synthetic data will evolve, Ramirez said. As use cases narrow, digital marketers will get a better sense of when synthetic data is a good fit, and when it is not, he said.
Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.