wwt

Why Your Million-Dollar AI Infrastructure Sits Idle 40% of the Time

Something my guest said stopped me cold: "Without automation, you are guaranteeing yourself to consume more GPU resources than you thought you were gonna consume." We've been thinking about AI infrastructure completely backwards.

Robb Boyd

13 Jul 2025 • 6 min read

Corey Wanless leads the Automate in support of AI with host Robb Boyd as part of the WWT Presents Research Series

Something Corey Wanless said stopped me cold during our conversation about AI automation. We were talking about GPU utilization when he dropped this line: "Without automation, you are guaranteeing yourself to consume more GPU resources than you thought you were gonna consume."

That's when it hit me. We've been approaching AI infrastructure like it's just another IT project. But these aren't ordinary servers we're talking about.

Picture this: Your organization just invested millions in cutting-edge AI infrastructure. But three months in, Corey tells me, those expensive resources often sit idle 40% of the time. Even worse, developers are waiting days to get their workloads scheduled while critical models take forever to deploy.

Here's what I learned about why this keeps happening—and the three steps that can prevent it from happening to you.

Download WWT's complete 2025 Automation Priorities Report
Watch the original interview on the WWT Platform: AI at a Standstill? How Automation Unlocks Real ROI

When smart people make expensive mistakes

Corey walked me through something that completely changed how I think about AI infrastructure. Unlike traditional workloads where you just add another server when you need more capacity, AI infrastructure operates on a completely different economic model.

"Those nodes are significantly more expensive," he explained, "somewhere on the power of 10x." Plus they require massive amounts of power and cooling that most organizations haven't planned for.

But here's the part that really got me: these aren't deterministic systems like we're used to. Corey gave me an example that illustrates this perfectly. Ask a traditional application what one plus one equals, and you'll always get two. Ask some of the original LLMs the same question? "Sometimes it might come back as three."

Think about what that means for testing, for resource planning, for everything.

The moment Kubernetes stopped being scary

I'll be honest - when Corey first mentioned building a Kubernetes foundation, I felt that familiar tech anxiety creeping in. Kubernetes has this reputation for being incredibly complex, and frankly, it can feel overwhelming.

But then Corey said something that completely shifted my perspective: "What I generally recommend to customers is to really look at certain OEM partners that package up their own supported delivery mechanism for Kubernetes."

He used this perfect analogy about buying a new TV with Android in the background - you're not interacting with Android directly, you're using the vendor's simplified interface. That's when it clicked for me. We don't need to become Kubernetes experts; we need partners who've already made those complex decisions.

What struck me most was why Kubernetes emerged as the standard in the first place. It's not because some committee decided it should be - it's because the open source AI community naturally gravitated toward it. As Corey put it, "The talent's not there. It's sparse in different places. So being able to use open source technology to accelerate and bring people together to solve certain specific problems is needed within the AI space."

The three essential steps to automate AI infrastructure effectively: Build a Kubernetes foundation, optimize GPU utilization, and automate AI testing. Without automation, organizations guarantee they'll consume more GPU resources than expected while expensive infrastructure sits idle.

The GPU optimization revelation that changes everything

Here's where the conversation got really interesting. Corey explained that GPU optimization isn't just about having the right hardware - it's about making thousands of micro-decisions about how that hardware gets used.

Take GPU fractionalization, for example. I had no idea this was even possible until recently. Corey walked me through three different approaches:

There's MIG (Multi-Instance GPU), which is like partitioning a hard drive - very precise but incredibly static. If you want to make changes, you're rebooting systems and moving workloads around.

Then there's time slicing, where multiple containers share the same GPU without knowing about each other. Corey had a great way of describing the problem here: the "bad neighbor effect." One container gets overwhelmed and makes the others suffer.

Finally, there's software-level solutions like Run AI, which Nvidia acquired specifically to solve this problem. What fascinated me was how it includes a custom scheduler that does something called "bin packing" - automatically moving small jobs to one node so larger nodes are free for massive workloads.

The automation happening behind the scenes is incredible. As Corey explained it: "Our time is spent less writing automation to say build thing one, then build thing two, then build thing three. Our automation now is focused on build all three things, and let Kubernetes and our underlying configurations decide when to run it and how long to run it."

Why testing AI is like wrestling with a moody teenager

The third piece - automating AI testing - revealed something that honestly surprised me. Traditional software testing assumes deterministic results. You input A, you get B, every single time.

But LLMs? They can have bad days. Corey actually said this: "You can hit it on a bad day where it just wants to give you an attitude."

He gave me a concrete example that made this real. Say you're using an LLM to generate SQL queries based on user questions. You test it, tune your prompts, everything works great. Then your IT team says they're upgrading to the latest model version.

Suddenly nothing works the same way. The new model might structure responses differently, handle edge cases differently, or just interpret your carefully crafted prompts in ways you never expected.

"One shot test doesn't always mean that you're good to go," Corey explained. "Whereas with more of an OS upgrade, the OS is gonna respond the same way every time that you make a system call."

That's when I realized why so many AI projects get stuck. Companies build something that works, but they haven't built the testing infrastructure to evolve with the rapidly changing AI landscape.

The real cost of getting this wrong

What keeps me thinking about this conversation is Corey's perspective on failure. He's seen countless projects where teams spend months setting up infrastructure, expectations build, and then they finally get to test their actual use case - only to discover it doesn't work.

But here's what he taught me: "Expecting failure is important. But how you respond after the failure is even of greater importance."

The companies that thrive aren't the ones that avoid failure - they're the ones that fail fast, learn something valuable, and apply those insights to the next attempt. Automation makes this possible by reducing the time between "let's try this" and "okay, that didn't work, but now we know why."

What this means for your AI strategy

Links to Resources Section

Download WWT's complete 2025 Automation Priorities Report
Access the https://www.wwt.com/wwt-research/it-infrastructure-and-operations-landscape-report
Request an Automation Briefing with WWT experts

Corey's final insight stuck with me: "While the initial cost of building in the automation to support your AI initiatives might slow you down initially, it's going to accelerate you beyond the slowdown from your initial need to invest in that area."

We're not just buying expensive hardware anymore. We're building the foundation for how our organizations will compete in an AI-driven world. The question isn't whether you can afford to invest in proper automation - it's whether you can afford not to.

The companies that figure this out first won't just save money on their GPU bills. They'll be the ones turning AI experiments into actual competitive advantages while their competitors are still trying to figure out why their million-dollar infrastructure sits idle half the time.

Cheers!

Robb

Robb Boyd spent 20 years at Cisco turning complex tech into stories people actually wanted to hear. He created TechWiseTV (two Emmy nominations, no big deal) and now runs ExplaiNerds, where he helps technology companies stop boring their audiences to death. You'll find him regularly leading Cisco Live events in the US and Europe for Cisco TV, plus other industry events where he's probably overthinking his next opening line or wondering why every vendor thinks their solution is "game-changing."