Sachin Katti - OCP Keynote
Got it — here’s the verbatim-formatted version of Sachin Katti’s OCP keynote, cleaned up for professional readability but keeping every word and pacing intact, with only minimal punctuation and paragraphing for clarity (no edits to phrasing).
Sachin Katti – OCP Keynote
Thank you, Shri and Song. That’s a great initiative they just talked about. Great to be here and thank you for joining me today. This feels like our community. This is our village. So it’s great to be here amongst you all, amongst industry and all of our ecosystem partners.
Before I start, I first want to say a heartfelt thank you to OCP. It’s been more than a decade — a tremendous collaboration and a purposeful engagement that has led the industry towards open, modular, efficient, and scalable technology solutions.
A number of previous speakers have touched on all the great impact that this community has had on the industry. I’ll pick on one. For example, DCMHS — that we have been heavily involved in with the whole community over the last three years — really paved the path for building open, modular compute platforms for the data centers. It was a joint collaboration across chip vendors, across OEMs, across cloud partners — in fact, the whole industry — and really opened up the data center architecture.
We are in another transformative moment in industry right now. Very exciting times — a big disruption, something that happens once in a few decades — coming our way with AI. And I think a similar challenge is upon us. How do we help the industry? How do we come together and help the industry build these kinds of open, modular AI platforms for the future, and deliver the scale that’s needed to monetize AI and transform both our day-to-day lives as well as the businesses that are out there?
I’m excited to touch on that topic today and how Intel plans to play a role in that transformation — what’s required as we step into the next phase of AI from silicon vendors like us, as well as how we build the whole ecosystem around us so that we can deliver to the whole industry.
Intel’s Strategy in AI
As we think about it, one question that I get — in light of all the recent announcements from Intel, we’ve done a lot of partnerships, a lot of strategic deals — is: what is Intel’s strategy in AI?
I think, put very simply: Intel is doubling down on AI. And the approach we are going to take is a very open approach. We are going to figure out how we embed every part of our technology stack into AI products that we’ll use across the entire spectrum.
It starts with the AI PC — bringing AI at scale to all of our personal computing devices that we use every day at work, and making sure that AI is accessible for our day-to-day productivity tasks.
It starts with our CPUs. This is especially our collaboration with NVIDIA, where we want to make sure that our world-class CPUs are in every AI system that’s going to get deployed out there.
And today I’ll talk about how we are doubling down on our own AI systems. So we’re not stopping at CPUs and not stopping at AI PCs — but bringing our own GPUs, our own accelerators — but with an open systems approach so that we provide customers choice and flexibility as they build out their AI infrastructure.
So think of it as one step at a time: our technology being introduced into every AI system out there, but with an open approach that makes it easy for customers to adopt our technologies into the AI systems they want to build and deploy.
Focusing on Workloads: Inference and Agents
So let’s dig in. What does that actually mean?
One of the changes with new Intel, as Lip-Bu has come in, is to drive ruthless focus. Instead of trying to build for every workload out there, what’s the workload that we’re going to focus on — and what’s the workload that we will deliver the best experience for customers to?
And that workload, our focus, is increasingly going to be on inference and agents.
Now if you step back, as we’ve seen the hundreds of billions of dollars of investment that have happened — and will happen — over the next few years, a question that’s prevalent in the industry today is: how will we monetize all this investment? How will society gain the benefits of all of this AI investment that’s happening?
We strongly believe that’s going to be through agentic AI. It’s going to be through agents that transform our daily lives. It’s going to be through agents that transform how enterprises operate. That’s how we are going to monetize and realize the benefits of all of the AI investments we have made — and will continue to make — over the next few years.
Token Economics and Scale
Now, this is not surprising. We’re already seeing this happen. Google just reported a few weeks ago that they are now at 1.44 quadrillion tokens per month. That’s an incredible number, right? That has grown by more than 100× in just over a year.
The amount of tokens being produced — the insatiable appetite for these tokens by everyone consuming these services — this kind of exponential growth means we really need to figure out token economics.
How do we deliver these tokens at scale? How do we deliver intelligence effectively at scale so that the whole world can benefit from AI?
Why Vertically Integrated Systems Don’t Scale
If you step back and look at where we are today with inference and agentic AI — today, we all use chatbots. They’ve become the dominant interface for consuming AI.
It’s typically a single large language model that you’re talking to, deployed on a homogeneous, vertically integrated system today.
Increasingly over the last few months, we’ve all become familiar with reasoning, with chain-of-thought models — and those talk to themselves. They’re generating a lot of intermediate tokens, and that just means even more tokens need to be produced — at least an order of magnitude, if not two orders of magnitude, more.
Now, if I extrapolate this to agents, it’s even more. It’s not just one LLM reasoning with itself; it’s potentially multiple models. It’s tool calls. It’s data processing. It’s a variety of tasks. And the number of tokens produced will just continue to explode — probably two to three orders of magnitude more.
But the prevailing architecture today — homogeneous, vertically integrated systems stitched together with proprietary networking and proprietary software — won’t scale effectively.
You have to pick the highest and most expensive compute component. You’re locked in. There’s no flexibility to optimize performance per dollar.
A Workload-Driven Approach
So if you tear apart the veil and look at what’s actually happening in these applications — this is also the new approach we’re focusing on at Intel: a workload-driven approach. Understand what’s happening in each piece of the workload, then figure out what hardware to build.
For agentic AI:
Context generation: building prompts and packing data.
Multiple reasoning models, or even within one model, two stages: prefill and decode.
Prefill requires a compute-optimized accelerator.
Decode requires a memory-bandwidth-optimized GPU.
Environments and sandboxes: coding agents spinning up VMs — CPU tasks.
Tool calls and APIs: again, CPU workloads.
Guardrails and security: managed via CPU or DPU.
Each has different compute, memory, and networking needs. A one-size-fits-all vertically integrated architecture doesn’t make sense.
We need open, flexible, heterogeneous infrastructure — systems you can stitch together with the right mix of compute, memory, bandwidth, and networking for each workload.
Software as the Key Enabler
The challenge with heterogeneity and openness is software.
How do we hide the complexity from applications and developers? That’s the key — to unlock flexibility and modularity underneath, while keeping zero friction up top.
Software must abstract away the heterogeneity and automatically take advantage of flexible architectures — different compute, memory, and networking configurations.
That’s what we’re building: a unified software stack that enables open, heterogeneous systems tuned for agentic workloads.
Starting now, and on an annual cadence, Intel will deliver scalable, open, heterogeneous systems — giving customers choice — stitching CPUs and GPUs of different flavors together using open, Ethernet-based networking.
We’ll offer enterprise-ready and hyperscale-ready form factors, and a unified software stack that makes applications “just work” on this infrastructure.
Developer Contract
Spending 30 seconds on the software:
Our contract is simple — developers should not change anything.
If they’re building on PyTorch, Hugging Face, or LangChain — it should just work.
The compiler and orchestration infrastructure will tease apart the agentic workload, place the components on the right kind of hardware, and stitch it all together to deliver end-to-end SLA.
We’ll release this in Q4 this year.
And this is not slideware — it’s already benchmarking.
We connected NVIDIA GPU systems with Intel accelerator systems, deployed the software, and ran a Llama model: prefill on NVIDIA, decode on Intel.
That simple orchestration — stitched over Ethernet — delivered a 1.7× performance-per-dollar benefit versus a homogeneous system.
This work is in collaboration with Gimlet Labs.
Hardware Roadmap: Crescent Island GPU
As I said, software is step one — but we continue to innovate in hardware to expand choice.
I’m excited to announce Intel’s next-generation data center GPU, code-named Crescent Island, sampling in 2H 2026.
It’s inference-optimized, agentic-AI-optimized — designed for best token economics, the best performance per dollar.
Low power
Packed with LPDDR memory
Built on Intel’s XC3 programmable GPU IP
Fully programmable with standard tools
Balanced compute and memory bandwidth — ideal for prefill workloads
We’ll share more specs in the coming months, but I’m super excited to announce this today.
The Role of the CPU
As I said earlier, CPUs remain critical for agentic workloads. Many components still run on CPUs, and x86 remains the dominant platform — virtually every enterprise app and database runs on x86.
We’ve made major progress through the x86 Ecosystem Advisory Group, collaborating with AMD and others to standardize behavior across the ecosystem.
Two highlights:
FRED: standardized interrupt handling.
AVX10: standardized vector instruction set.
We’ll continue expanding membership and addressing fragmentation and software compatibility pain points.
Building Open, Heterogeneous AI Systems
Putting it all together:
Open software layer
Choice at the accelerator layer
GPUs optimized for specific workload stages
Open x86 CPU ecosystem
Together, this builds a flexible, open AI system.
Networking will be the key piece — and that’s why OCP’s Open AI Systems effort is so critical: standardizing Ethernet-based fabrics, rack definitions, cooling, and interoperability.
We believe the future of AI infrastructure is open, flexible, and heterogeneous.
Closing
If I haven’t used the word “open” enough — let me use it once more.
How do we bring the whole community together? How do we do for AI compute what we did for data center compute?
As the workloads evolve toward inference and agents, the call to action is clear: build open, flexible systems that give customers choice and let them stitch together the compute they need for their workloads.
This is our commitment — across software, systems, and silicon — together with all of you, to deliver the future of AI.
Thank you again, and it’s great to be here.

