Study shows AI agents struggle with CRM and confidentiality

The study, led by a Salesforce researcher, found agents had a 58% success rate on simple tasks and a 35% success rate on multi-step ones.

Chat with MarTechBot

Large Language Model (LLM) agents aren’t very good at key parts of CRM, according to a study led by Salesforce AI scientist Kung-Hsiang Huang.

The report showed AI agents had a roughly 58% success rate on single-step tasks that didn’t require follow-up actions or information. That dropped to 35% when a task required multiple steps. The agents were also notably bad at handling confidential information.

“Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance,” the report said.

Varying performance and multi-turn problems

While the agents struggled with many tasks, they excelled at “Workflow Execution,” with the best agents having an 83% success rate in single-turn tasks. The main reason agents struggled with multi-step tasks was their difficulty proactively acquiring necessary, underspecified information through clarification dialogues. 

Dig deeper: 7 tips for getting started with AI agents and automations

The more agents asked for clarification, the better the overall performance in complex multi-turn scenarios. That underlines the value of effective information gathering. It also means marketers must be aware of agents’ problems handling nuanced, evolving customer conversations that demand iterative information gathering or dynamic problem-solving.

Alarming lack of confidentiality awareness

One of the biggest takeaways for marketers: Most large language models have almost no built-in sense of what counts as confidential. They don’t naturally understand what’s sensitive or how it should be handled.

You can prompt them to avoid sharing or acting on private info — but that comes with tradeoffs. These prompts can make the model less effective at completing tasks, and the effect wears off in extended conversations. Basically, the more back-and-forth you have, the more likely the model will forget those original safety instructions.

Open-source models struggled the most with this, likely because they have a harder time following layered or complex instructions.

Dig deeper: Salesforce Agentforce: What you need to know

This is a serious red flag for marketers working with PII, confidential client information or proprietary company data. Without solid, tested safeguards in place, using LLMs for sensitive tasks could lead to privacy breaches, legal trouble, or brand damage.

The bottom line: LLM agents still aren’t ready for high-stakes, data-heavy work without better reasoning, stronger safety protocols, and smarter skills.

The complete study is available here.

Fuel up with free marketing insights.

Email:


MarTech is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.


About the author

Constantine von Hoffman
Staff
Constantine von Hoffman is managing editor of MarTech. A veteran journalist, Con has covered business, finance, marketing and tech for CBSNews.com, Brandweek, CMO, and Inc. He has been city editor of the Boston Herald, news producer at NPR, and has written for Harvard Business Review, Boston Magazine, Sierra, and many other publications. He has also been a professional stand-up comedian, given talks at anime and gaming conventions on everything from My Neighbor Totoro to the history of dice and boardgames, and is author of the magical realist novel John Henry the Revelator. He lives in Boston with his wife, Jennifer, and either too many or too few dogs.