Mind the Gap: Why AI Struggles to Scale and What Leaders Need to Confront

AI has become remarkably easy to experiment with, while remaining stubbornly difficult to scale. That tension sits at the centre of Mind the Gap: Bridging From Sandbox to Scale, a CTO Craft survey conducted in partnership with Ten10 and The Scale Factory, and explored further during a recent CTO Craft Bytes session.

Teams can now stand up proofs of concept in days, sometimes hours, using tooling that barely existed a few years ago. Yet for most organisations, the journey from an impressive pilot to something that operates reliably and delivers sustained value remains uncertain at best. The gap between experimentation and production hasn’t closed. In many cases, it’s becoming more visible.

The survey results put clear numbers against that reality. Sixty-seven percent of organisations report that fewer than half of their AI proofs of concept ever reach production and deliver measurable business impact. For leaders who’ve spent time trying to operationalise complex systems inside real organisations, that figure feels less like a surprise and more like a reflection of lived experience.

From convincing prototypes to operational reality

One way to describe the current state of AI adoption is that experimentation has become comfortable, while delivery remains hard. Prototypes can be convincing, fast to build, and easy to demonstrate, often without forcing teams to confront the operational work that sits beneath the surface.

Over a third of organisations admit to building AI prototypes without serious consideration of how they would ever run in production, and only a small minority report consistent success in scaling them. A strong proof of concept can create momentum quickly, but it can also create assumptions. Once something appears to work, conversations tend to move towards timelines and rollout, even though the unglamorous work around integration, security, operating models, and support has barely started.

This is where many initiatives begin to stall. The remaining work starts to feel like friction rather than part of delivery, and expectations quietly move ahead of reality without ever being reset. What looked like progress became a source of tension, not because the technology failed, but because the organisation wasn’t ready to carry it forward.

Alignment, foundations, and the work we keep deferring

While technical challenges are real, the survey suggests that organisational misalignment is often the more stubborn barrier. Stakeholder expectations diverge early, particularly once a prototype demonstrates potential value. Business leaders may see a working system and assume deployment is close, while engineering teams understand that the most complex and expensive work is still ahead.

That misalignment shows up clearly in collaboration patterns. Nearly 60 percent of organisations describe collaboration between data science, engineering, and business teams as neutral or ineffective. It’s difficult to build reliable production systems when ownership, success criteria, and responsibility remain unclear across those groups.

When AI projects do hit technical limits, they tend to surface long-standing issues rather than novel ones. Data quality and governance still shape what’s realistically possible. Integration exposes complexity once systems have to interact with existing platforms and workflows. Security and compliance often arrive late enough to force redesign rather than refinement. AI has a way of accelerating the moment when these weaknesses can no longer be worked around.

The same pattern appears in MLOps maturity. Nearly three quarters of organisations describe their approach as basic or manual, with limited automation around deployment, monitoring, or retraining. For many leaders, this feels familiar. The challenges mirror those that drove the rise of DevOps, where alignment, automation, and shared ownership mattered as much as tooling.

Rethinking what success actually looks like

One of the more telling shifts in the survey is how leaders define value. Accuracy still matters, but it’s no longer treated as the primary indicator of success. Reliability, adoption, and measurable business outcomes carry more weight, particularly when systems are expected to operate as part of everyday work.

A model that performs slightly worse but is trusted, supported, and consistently used will usually create more value than a technically impressive one that never leaves the sandbox. That perspective reflects a broader maturity in how organisations think about technology, focusing less on elegance and more on impact.

The organisations making progress aren’t doing anything radical. They’re investing earlier in operational foundations, tying AI initiatives to clear business outcomes, and treating delivery as a cross-functional responsibility rather than a specialist exercise. Most importantly, they’re more honest about where the hard work sits.

The gap between AI experimentation and AI in production remains wide, but it isn’t mysterious. It’s shaped by decisions organisations already know how to make, if they’re willing to confront them sooner rather than later.

If you’d like to explore the findings in more detail, you can download the full report here.

Author