Sharing AI wins and fails will save marketers from repeating mistakes
Understanding AI’s wins, pitfalls and best practices starts with sharing what actually works — and what doesn’t.
Marketers rarely share A/B test results. With AI, that’s becoming a bigger problem — and opportunity. AI can deliver significant wins, but it can also stumble in ways that hurt campaigns and brands.
Most of what we learn stays hidden inside individual campaigns, leading to duplicated effort, repeated mistakes and slower progress. It’s time to learn from AI’s boosts and blunders and share the results.
When shiny new tech meets reality
I’ve always been a bit of a geek. I love trying out the latest technology, especially when it saves time or helps me create something better. I’ve also learned that shiny new tools don’t always live up to the hype.
Some take longer to master than the time they’re supposed to save. Others are unreliable or rough around the edges until years of updates smooth them out. And sometimes, they never deliver on their promise at all.
AI is the latest shiny tool marketers are reaching for. It promises huge benefits — from faster copywriting to smarter lead scoring — and it’s already clear that AI can help us do more in less time. But while it delivers quantity, can it truly deliver quality?
Before you stop reading and imagine I’m some anti-AI Luddite determined to keep marketing an artisan, handcrafted profession, you should know: I’m a massive believer in AI. It has demonstrably improved the work we do for our agency and our clients.
Of course, it’s also let us down more than once. But first, let’s talk about where AI has delivered.
Where AI delivers
AI has delivered some impressive marketing wins. Heinz used it to generate ketchup bottle images, and Nike simulated Serena Williams’ tennis matches. The digital marketing discovery group DigitalDefynd even tracks these and other standout AI campaigns.
But most of these successes come from extensive, expensive efforts that produce vast volumes of content. That’s not the everyday reality for most marketers. What matters to them is knowing when AI starts delivering incremental improvements — and when it begins to stumble and become a liability.
Right now, that’s hard to quantify without significant financial resources. To paraphrase John Wanamaker: “Half the money I spend on AI is wasted; the trouble is, I don’t know which half.”
Dig deeper: Your AI strategy is stuck in the past — here’s how to fix it
Where AI stumbles
You probably already use AI in many martech tools, like Google Ads and its responsive ad format. The idea is simple: you create headlines and descriptions and the AI system quickly works out the most effective combinations. Google will even draft those pesky headlines and descriptions for you.
But without human checks for brand standards and legal requirements, letting AI off the leash can backfire.
It’s also unrealistic to A/B test the responsive ad format against every possible combination to confirm the best results. Most people assume it works well — until they write a killer headline or description shorter than the others, only to see it shown rarely and quickly discarded.
In those cases, you can’t convince me there’s enough data to be statistically significant. Either AI is guessing, or it’s operating under a rule that says you must use as many characters as possible — or else.
AI also stumbles on personalization. We’ve all seen cringeworthy AI-generated emails that scrape a company’s website and somehow make 2 + 2 = 27. These emails:
- Are formulaic.
- Written in a style that’s clearly machine-generated.
- Often deliver confident but completely false statements.
Many senders don’t have time to review the tens or hundreds of thousands of emails an LLM produces for them, but they should at least sample enough to know whether those messages are quietly damaging their brand.
Dig deeper: How to use generative AI in copywriting for an A/B testing program
Why AI errors can hurt
Nobody’s perfect — including AI. We all accept a certain level of hallucination (or, more honestly, errors) in AI output. It’s impossible to avoid, but with careful input, AI usually gets it right most of the time.
We recently ran a simple test: We asked AI to list the top three markets for each company we were emailing. We wanted only four words for each email (one of them always being “and”).
The results? About half were great. A handful were utterly wrong. The rest were just OK. A human would have produced lists that were clearer and more insightful.
Balancing AI’s boosts and blunders
Overall, the boost from using AI for personalization was significant. But even though the number of screwups was relatively small, the potential damage was high.
In this case, we were marketing to a very small, very well-defined audience. If the AI’s errors hadn’t been checked and corrected, we could have seriously harmed our brand with a financially important market segment.
If we call the boost in ROI from using AI “B” and the percentage of screwups “S,” then the math suggests that everything looks great as long as B is greater than S. And AI can usually clear that bar.
But this analysis ignores something critical: the long-term impact of brand damage. Mistakes are cumulative.
Right now, people are excited about the immediate improvements AI brings. But we should also focus on minimizing the mistakes.
The easiest way is to avoid using AI when it’s most likely to hallucinate. With some basic training, marketers can learn to spot those risks before they become problems.
Dig deeper: AI’s big bang effect means marketing must evolve or die
Let’s open-source the world’s biggest A/B test
As a rule, marketers don’t share the results of their A/B tests. Some martech tools try to aggregate results, but if “blue” turns out to be the winning color for one campaign, that doesn’t mean every campaign should suddenly turn blue. That’s the kind of overgeneralization AI is prone to make.
AI is different, though. There are many ways we all use it in similar contexts. For example, when writing Google Ads headlines, AI is great at filling an empty box with options. About 95% of the time, its suggestions are solid.
But there’s a big caveat: AI frequently struggles when working with deep tech clients. Generating text that conveys highly technical information — or compressing features into benefits within 30 characters — isn’t one of AI’s strengths.
The real opportunity is pooling what we learn. If marketers contribute their AI results, we can build the world’s biggest open-source A/B test.
Contributing authors are invited to create content for MarTech and are chosen for their expertise and contribution to the martech community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. MarTech is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.
Related stories