The Radical ROI of Generative AI: Where the Money Actually Lands
How Snowflake’s global study maps enterprise ROI, data readiness and the six-quarter path from pilots to agentic workflows.
Most enterprises aren’t “experimenting” with generative AI any more; they’re booking returns.
In a survey of 1,900 early adopters, 92% say projects have already paid back, with an average ROI of 41%. That’s not a rounding error. It’s a signal that value has moved from PowerPoint presentations to the P&L. The surprise isn’t the headline number. It’s what drives it: unglamorous data work, multi-model plumbing, and disciplined use-case selection.
This article draws upon the findings from The Radical ROI of Gen AI, a report commissioned by Snowflake, a cloud-based data platform that helps organisations store, connect, and analyse large volumes of data across different cloud providers. Snowflake features prominently in many of these efforts, providing the infrastructure that makes AI projects possible.
A quick caveat before we dive in. This is a study of early adopters, organisations already in market, not those still forming a steering committee. The laggards aren’t represented, so treat the figures as a directional ceiling, not a universal average. Still, the pattern is persuasive and (crucially) repeatable.
What the numbers actually say
The topline is clear: most firms are in the black on gen-AI deployments, and the gains aren’t confined to one department. IT operations, cybersecurity, software development and customer service lead on measurable improvements: faster incident response, lower toil, higher uptime, cleaner code, happier customers. Commercial teams namely sales and marketing are catching up, with early lift in forecasting, content personalisation and service deflection. Engineering moved first. The front office is following.
A small but telling detail: the split between employee-facing and customer-facing deployments is almost even. Big companies tend to go outward sooner. They have the scale (and governance) to do it.
Net: the value is real, broad-based, and increasingly visible in run-rate metrics rather than pilot dashboards.
The rub: unstructured data still slows everything down
Ask teams why projects take longer than planned and the same answer returns: content isn’t ready. Most organisations admit that only a minority of their unstructured data documents, emails, PDFs, creative assets, transcripts is AI-ready. The blockers are predictable but stubborn: labelling takes time; quality is patchy; permissions are unclear; duplication is everywhere.
If you’re a CMO or publisher, this is the rate-limiter. Not model choice. Not prompt magic. Content hygiene.
Human translation: the money is in the plumbing, labelling, lineage, sensitivity flags, freshness SLAs. It’s dull. It also determines your speed.
Costs and why they blew out (and why budgets are still rising)
Most adopters saw overruns somewhere: computing, supporting software, data prep. None of this is shocking. Models are hungry. Pipelines need building. Guardrails aren’t optional. Yet almost everyone is increasing budget next year because the benefits outweigh the noise. The CFO hasn’t been hypnotised by a chatbot; they’ve seen the before/after in tickets closed, minutes shaved, customers served.
Takeaway: expect to spend more on the boring bits governance, observability, prompt/response logging, evaluation harnesses because those are the bits that turn shadow AI into a supported capability.
Models: plural by default
The centre of gravity is moving towards multi-model. Most organisations combine one or two commercial LLMs with selective open-source models. Many plan to run three or more, coordinated through an LLM gateway that handles routing, safety, cost and telemetry. Retrieval-augmented generation (RAG) is now table stakes; fine-tuning with proprietary data is common when tone, domain accuracy or compliance matter.
If you find yourself debating “which single model should we commit to?”, you’re asking last year’s question. Commit to a gateway and evaluation regime; keep optionality on models.
What good looks like in the real world (two quick mini-scenarios)
Support deflection without the drama. A national telco points RAG at its help-centre, enforces strict sensitivity tagging and builds a straightforward escalation ladder. Result: faster first-response times, fewer handoffs, measurable CSAT lift. Not magic, just content quality and guardrails.
Engineering toil, then uptime. A SaaS firm adopts a code copilot, plus incident-response summarisation in Ops. Bug MTTR (mean time to repair) dips. On-call rotations feel saner. A month later, reliability KPIs tick up. Again, not flashy; quietly transformative.
These aren’t outliers. They’re the pattern: start where decisions repeat at scale; attach a hard KPI; enforce human-in-the-loop where risk lives.
Industry notes that actually matter for marketers and media
Financial services lean into support and security. They prioritise ROI and get it because processes are well defined and data is controlled.
Healthcare & life sciences report above-average ROI, helped by structured workflows and an urgent need to cut delays.
Manufacturing stays pragmatic: inventory, quality, maintenance. Fewer headlines, more savings.
Technology firms are furthest along but feel the strain of too many opportunities and finite budgets.
Marketing/advertising/media? Behind the average on ROI, and more likely to say projects ran long. Accuracy gets the blame; governance debt is the cause. The fix is unglamorous: standardise content pipelines, pick three use cases (not thirty), and wire them to metrics the CFO already trusts.
The UK angle: focused, practical, occasionally under-resourced
British organisations look unusually targeted. They deploy where operational efficiency and customer-service gains are obvious: software development, service, security. They’re also more likely to train or augment models with their own data and to run workloads mostly in the cloud. The weak flank: unstructured-data capability still lags, creating friction in marketing and knowledge management. In short: strong where process is crisp; slower where content is messy. Feels familiar.
The six-quarter roadmap (and why this order matters)
Quarters 1–2: Data and guardrails. Inventory high-value unstructured sources. Label them. Add sensitivity flags. Set freshness SLAs. Stand up an LLM gateway. Define prompt/response logging and evaluation. Ship two small but visible use cases: one employee-facing (Ops or Engineering), one customer-facing (Service). Score them weekly.
Quarters 3–4: Scale and specialise. Bring RAG to content-heavy tasks (knowledge bases, catalogues, editorial archives). Fine-tune where tone and domain precision matter (brand, service). Expand into forecasting-driven recommendations in sales and merchandising. Harden service chat with deflection and CSAT as north stars.
Quarters 5–6: Agentic workflows, tightly scoped. Automate slices of repetitive, rules-bound processes—QA triage, content updates, pricing checks—under supervision. Constrain agent scopes. Attach evaluation harnesses to every step. This isn’t science fiction; it’s careful automation.
If you remember one thing about the roadmap, make it this: don’t skip the data work in Q1–2. Everything else goes faster afterwards.
How to choose use cases (and say “no” nicely)
Frequency × risk × measurability. High-frequency tasks with moderate risk and clear, accepted KPIs go first.
User journeys before features. Build around a journey (e.g., “resolve a billing query”), not a feature (“answer FAQs”).
One owner per use case. Not a committee. Someone who loses sleep if the KPI doesn’t move.
Timebox experiments. Thirty days to learn or kill. Move on.
A small operational note that saves arguments: write your own “definition of done” before you prototype anything. If you can’t write it, don’t build it yet.
Platform choices: the pragmatic checklist
A data platform that spans structured and unstructured sources, with governance baked in.
An LLM gateway to route, monitor and control cost/safety across multiple models.
Observability at the application layer (prompts, responses, outcomes) and the data layer (lineage, quality).
A content pipeline that treats assets like products: versioned, labelled, owned.
None of this is thrilling. All of it is decisive.
What to watch for in the next 12 months
Agentic systems doing real work in narrow lanes: multi-step, policy-bound, supervised.
Model churn as performance/cost curves shift. Gateways make this survivable.
Internal search getting quietly excellent once content is labelled.
Procurement catching up to the reality that model optionality is a feature, not a cost.
And yes, more governance. Not to slow things down—but to keep the lights on while you speed up.
Signs you’re on the right path
The share of AI-ready unstructured content in your priority domains climbs quarter by quarter.
You run 2–3 models in production with confidence, swap others in and out behind the gateway.
Operational metrics (time-to-first-response, MTTR, cost-to-serve) improve inside a quarter.
Commercial metrics (engagement, CTR, conversion, revenue per customer) follow within two.
If these aren’t moving, go back to your content inventory and your “definition of done”. One of them is off.
Bottom line
The winners will look boring up close: clean content; a gateway; three well-chosen models; KPIs that move. Everything else—novel interfaces, breathless demos—can wait. Generative AI has already cleared the credibility hurdle. From here, execution decides who compounds value and who merely experiments.
Thanks for reading all the way to the end of the article! This post is public so feel free to share it, and if you have not done so already sign up and become a member.