Chipstrat

Chipstrat

An Interview with Microsoft's Saurabh Dighe About Maia 200

The why behind Maia 200. SRAM–HBM balance, scale-up without scale-out, 30% lower TCO claims, and more.

Austin Lyons's avatar
Austin Lyons
Jan 28, 2026
∙ Paid

Hello readers! This is another Chipstrat interview, where I speak with builders and operators about the decisions and trade-offs shaping their strategy.

Today’s interview is with Saurabh Dighe, CVP of Azure Systems and Architecture at Microsoft. Earlier this week, Microsoft announced Maia 200, its second-generation AI accelerator built specifically for inference economics.

In this conversation, we examine why Microsoft builds custom silicon and why Maia 200 is intentionally optimized for inference rather than training. From there, we unpack the technical consequences of that choice: Microsoft’s view of an “efficient frontier” for inference, the trade-offs between scale-up and scale-out architectures, and the decision to favor a large Ethernet-based scale-up domain with a custom transport layer.

Dighe walks through the shift from Maia 100 to Maia 200, including a significantly larger on-die SRAM and a memory hierarchy that balances SRAM, HBM, and system-level DRAM. We also cover long-context workloads and KV cache management, including when data remains hot on-die and when it moves into lower tiers of memory.

Equally important, we discuss the business logic behind these technical decisions. Maia 200 is not framed as a replacement for merchant GPUs, but as a complementary part of a heterogeneous fleet. Dighe also explains how capacity is allocated across internal and external customers, how Maia is exposed through Azure services rather than as a standalone product, and the importance of software investment ahead of silicon, from compilers to kernel libraries to pre-silicon tooling.

On to the interview.

The formatted, quickly readable transcript is available below for paid subscribers.

An Interview with Microsoft’s Saurabh Dighe About Maia 200

Hello listeners, we have a special guest today to discuss Microsoft’s Maia 200, the AI inference chip announced earlier this week. Welcome, Saurabh Dighe, CVP Azure Systems and Architecture. Thanks for coming on to chat with me.

SD: Oh, it’s a real pleasure, Austin. Thank you for inviting me. I’m happy to disclose Maia and happy to take any questions you may have.

I want to talk a lot about the why behind Maia, from business decisions like “Why make custom silicon?” And then, of course, the why behind the technical trade-offs. I think those are interesting and communicate something about what you’re prioritizing and what you’re not.

But first, how about a quick refresher for anyone who missed the Maia 200 announcement?

Why Custom Silicon? The Business Case

SD: Great. So I’ll just start off—we have been on this multi-generational silicon journey at Microsoft. We know that our customers want leading AI infrastructure. And one of the things that we have to do when we develop leading AI infrastructure is have the freedom to innovate across every layer of the stack. So silicon is a very important part of this aspect.

And so just yesterday we announced Maia 200, which is our second generation of AI accelerator, and we’ll be happy to go into a lot more detail with you as the podcast progresses. But this is really targeted towards performance leadership for inference. So performance per dollar and performance per watt leadership for inference. We made very deliberate architectural choices.

We said, hey, we really need to make sure that as this wave of AI adoption accelerates and more and more customers start using AI, inference economics is going to be a major reason to invest in silicon and systems.

So that’s what Maia 200 is all about. It was really to drive the best AI infrastructure for us and meet the customer at the highest performance level, but also at the lowest cost.

Okay, yes, let’s dive in there. So you talked about designing for your customers and thinking about TCO and performance. When most industry observers talk about custom silicon, they frame it as a margin play—build your own chip, cut out the middleman’s markup. But you’re talking differently about Maia. I saw the blog posts from you and EVP Scott Guthrie, and you both talked about working backward from what your customers need. So let’s take it from that angle. What are customers asking for that today’s GPU fleet might not deliver?

SD: That’s a great question. The question inside Microsoft was never, “Can we build a GPU?” The question really is “What do we need to build the best AI infrastructure for this era of AI?” And if you take it from that lens, it becomes obvious that AI inference is actually a frontier-efficient curve. You have to be able to deliver real-world capability and accuracy at different points in the curve—different latency, cost, and energy points.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Austin Lyons · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture