🎧 Chipstrat + The Circuit Podcast

AI Chips: Adoption Challenges, Startup Opportunities, Groq, Hyperscaler Custom Silicon, AWS' Strategy, Legacy Architecture Baggage, and more

Mar 21, 2024

I’m excited to share my recent chat on The Circuit podcast with Jay Goldberg and Ben Bajarin discussing AI accelerators. There are some real gems in there - give it a listen or check out the transcript below.

The Circuit: Discussing All Things AI Accelerators With Austin Lyons (Episode 59)

BB: Hello semiconductor fans. Welcome to The Circuit. I am Ben Bajarin.

JG: Greetings, internet. I am Jay Goldberg.

BB: We have a distinguished guest with us today, Austin Lyons. Austin, say hello and maybe just give a brief background and introduction on yourself for our listeners.

AL: Hello, everyone. My name is Austin Lyons, and by day I'm a product manager at Blue River Technology. You may have heard of Blue River, a Bay Area startup that was acquired by John Deere. Blue River applies computer vision and machine learning to agricultural problems.

You may have heard of John Deere's See and Spray technology, or at CES, you may have seen our autonomous tractor. I specifically am a product manager that works on autonomy.

And outside of work, I analyze the semiconductor industry at Chipstrat. This is a fun passion project of mine.

My background in semiconductors – I have a bachelor's in electrical engineering, and a master's in EE from the University of Illinois where I conducted experimental nanoelectronics research with a professor named Eric Pop. We were using graphene to make transistors, and I was studying the electrical conductivity and the thermal conductivity of graphene.

After grad school, I went to Intel and I worked for a bit there in Austin, Texas, before my entrepreneurial itch took over. Then I went and worked as a software engineer for some startups for a while.

And then about six years ago, I moved into product management. You know, when you work for a startup and you have a finite runway, you start to tune into, hey, is what we're building actually what customers want? Is this extending our runway? And, you know, I was the guy raising my hand a lot just saying, “do customers really want this?” And that's how I moved into product management.

And so, fast forward, I am with Blue River Technology. I recently finished up my MBA. So now I'm spending my nights and weekends using my MBA and my semiconductor training and background to do some writing.

JG: And you have a family, a family of four [kids], right?

AL: Yes, I have four kids. My wife took them all to the zoo so that you wouldn't hear them [in the background of the podcast].

BB: That's tremendous. Okay, so since you have all of this background and you worked at Intel, I'm gonna put you on the stop on something in the hopes that you take my side and not Jay's side. So I'm gonna try to get us outnumbered.

Are you bullish or bearish on Intel's future?

AL: Oh, I'm bullish. For sure.

BB: Oh Jay, you got two bro, it's two against one.

I did that just for Jay. Really, that was just to pick on Jay because he's my counter when we talk about Intel. Like whenever I get negative, he's like, no, you're supposed to be positive. You can't be positive. I have to be negative. Like I can't be the positive one. So that's awesome. Good, good, good. Okay, subject for another time.

All right, so. What I like about this is you're going to be able to tackle this from two ways. The broad analysis, you look at everything that's going on like we do, but you also will actively be looking at implementation of things, if it makes sense for what John Deere is doing both in cloud learning, inference training, and at the edge.

Because obviously I think John Deere is perhaps secretly, maybe not secretly, one of the most innovative companies out there for AI and machine learning. I have been able to ride on an autonomous tractor down at the Blue River facility, not too far from my house. And I remember walking through all of the technology that's on this tractor and it was impressive. So there's that element of this as well, but maybe in sort of ground strokes, right?

People talk about this need for AI accelerators for training, inference, the edge – just high level, what’s your read on this conversation around, not just GPUs, not just CPUs, but specific things that do acceleration.

Definition of AI Accelerators, Role of GPUs, Opportunities for Startups

Is there opportunity for AI acceleration? Let's, let's jump off there.

AL: I think it's an awesome question.

“AI accelerator” is a bit of a confusing term. What are we accelerating this AI workload compared to?

Let’s start with the base case. if you just had the most simple CPU – single core, single threaded, single instruction, single data – it can still run AI.

But it's slow, because AI workloads are generally “inherently parallel”. They have parallelism. So anything you can do to increase parallelism, you can accelerate your workload.

That could look like increasing CPU core count. We could add multi-threading. We could add vector processing.

And now you could argue we’re accelerating your AI workload.

But I think we can all agree that this [high-core count CPU] is not an AI accelerator.

Then if you continue along the spectrum and say, okay, well, what if we had some sort of parallel processing machine and it had tons of simple cores? We could cut out all that extra stuff. The [simplified] cores would be slower than a CPU, but they're smaller so you can fit all these cores on the chip and you could really parallelize things.

Well, guess what? We have those parallel processing machines – they're GPUs. They are general-purpose parallel-processing units.

And right there, you can already see that a GPU would definitely accelerate AI workloads compared to a CPU, whether it's single core or many-core CPU.

Now, of course, being general purpose, GPUs still have trade-offs. When we look to implement AI inference in various places, whether it's the edge or the cloud, I think customers are looking to pull on different levers [with AI ASICs]. Maybe it's even lower latency than what a GPU has to provide, or even lower power consumption – for example, at the edge.

I think that's where the opportunity is for startups in the AI accelerator space – “how can we compete on a certain vector and be even better than CPUs or GPUs? Can we create custom ASICs that are more power efficient or faster?”

BB: That makes sense. I wonder – we see the overall trend toward companies, the likes of Nvidia – their accelerator might include a CPU like the Grace CPU – or a number of startups wanting to do acceleration. Sometimes they’re doing FPGAs and embedded as just sort of a basic acceleration.

I think my general approach to this has sort of been several things that you said.

First, there's latency.

I saw a great slide, which I don't have in front of me, but it came from John Deere, where they basically showed the cost of inference in the cloud versus on tractor when it came to visual recognition. All of the things that it does to have safety implemented. And it made a really strong argument that you save money, you gain speed when you have these things on device.

Again, this is a giant computer on one of these tractors with a bunch of GPUs and 10+ cameras. But the economic benefits, the zero latency, I think was one of those things where it was interesting to see that analysis done as the trade off – it would actually cost a lot more for me to do this in the cloud.

And when I thought about that, I thought about automotive. We're talking about Waymo and Robo taxis. It still needs cloud connection, but there is a lot that's going to happen locally because it needs to be, and it's more cost effective.

AL: Definitely. When you're talking tractors or automotive – when you're talking autonomy – you want to do as much processing at the edge as you can, obviously for latency's sake, but also in the absence of an internet connection.

Some of these tractors might be in a field where they don't have an internet connection, but we still want to be able to take what our cameras are seeing and run object detection and segmentation on it and ask, “is there a person there?” Even if I'm not connected to the cloud. Some of the use cases also depend on your environment.

Challenges in Accelerator Adoption

BB: Do you think this is really more an inference thing, or are there areas for training? I say this because there's some dedicated startups who are actually doing dedicated training acceleration in the cloud. So I always sort of wonder – has Nvidia won training?

I think it is an interesting topic. Is there an opportunity [for startups]? Because people are trying, but is that a custom thing that needs to be made? A custom training wafer or a whole block? I don't know if the jury's out, but I'm intrigued by that decision about where is the best implementation of this for these workloads.

AL: Super interesting. I'll give my take and then I'll let Jay give his take. We'll see where we land.

From a training versus inference perspective – I think a lot of startups go after inference because there's an argument that inference is a large market in and of itself, and will continue to be very large. You might do training maybe annually and then you're just inferencing all the time after that.

From first principles an ASIC could definitely outperform – in theory, on paper – NVIDIA GPUs because you're getting rid of that general purpose cruft, or what I call a general purpose bloat, and making the best chip for training.

It's tricky because in training, usually your GPUs or your chips have to talk to all the other chips if you're scaling [your workload] across many chips.

So not only do you have to make a good chip, but you have to have networking. And of course, NVIDIA has that already [Infiniband].

So I do think inference is an easier surface area to tackle with a bigger market for the startups to focus on first.

What do you think, Jay?

JG: I think training is done. I think it's Nvidia’s market for the foreseeable future. AMD will carve out a little niche for itself. And absent some really spectacular things coming from Intel Gaudi, I think we're at the point now where training is going to be dominated by Nvidia. Maybe Google or some of the other hyperscalers come up with something. But even there, I think they're going to rely on Nvidia.

JG: I have a question about just accelerators in general, which is you make a good case why accelerators can be superior in performance. And yet, we haven't seen many take off, right?

The market is still heavily, heavily Nvidia for inference and some of the internal stuff, the TPUs and whatnot from hyperscalers. But we've been able to see – by my count three rounds – of heavy, heavy venture investing in the accelerator space and the industry doesn't have a lot to show for it.

Why? What's failing there?

AL: Yeah, that's a good question.

At the end of the day, there's an architecture spectrum on which a startup places bets when they create an inference chip. One end of the spectrum is very flexible, even though it's an ASIC. Think Google's TPU or Groq's LPU, maybe Tenstorrent would play here.

In the middle of the spectrum, they're going to make different architectural decisions – like Etched — they're going after like a transformer specific chip.

On the far end of the spectrum is a model specific chip. There's a startup that just came out of stealth in the last week or two, Taalas. They’re going to make model specific chips. So think of a chip that is basically hard coded to run Llama2 7b, for example.

And maybe it's that existing startups that came before – let's just assume that every startup is full of smart people and they're good at execution – but maybe they placed their bet in the wrong place [of the spectrum].

You could also argue maybe that they were too early, before transformer LLMs took off and before we had these large models that are actually really compelling.

But my hypothesis is that the bets weren't placed in the right place.

Right now we're seeing startups that are placing bets across the spectrum, so maybe in the coming years we will see startups have enough success and have enough performance that they could chip away at Nvidia or whoever else a little bit.

JG: Yeah, I mean, I think that jives with my thinking, which is software, right? The root of the problem we've seen in past cycles, these past waves of startups, has been software. The software ecosystem is moving faster than the semiconductor silicon can.

And right now, I think that's what killed a bunch of them. I won't pick on anyone, but there's a bunch of names we could go through that had a good solution on paper. It looked fantastic. But the software either moved beyond them or it just wasn't enough of a boost on the workload to merit somebody moving and optimizing for these startup solutions.

But then that opens the question – are we done with software? Is it the transformer, and that's it? We're going to stick with transformers? Or is there going to be something else that comes down the road? And is it enough to have a chip and a compiler that's optimized for transformers, or do we even need to get more specific than that? Like you mentioned model specific chips. That seems, I understand where the performance gains are gonna come from, but that seems really risky as a startup to bet on something like that.

BB: What would be interesting, because Austin mentioned this idea of GPU bloat – I think there's something interesting afoot here. Because, if we recall, looking back in history, one of the seminal moments that Jensen did was reposition the GPU from a specific-purpose gaming solution to a general-purpose GPU strategy. So he called it GPGPU compute. And that made a ton of sense, because he basically said, “I don't want it to be niche. I want it to be general purpose. And so Cuda is what's going to unlock that.”

Now, when he starts talking about accelerated computing, he's narrowing that back down. He's starting to not say, and even some his talks – if you caught the talk he did in Stanford this week, he actually avoided using the talk of GPU as a general purpose product to a specialized acceleration for AI.

So I think this is interesting because it makes this point – should we not think about GPUs as general purpose? Because if we shouldn't, then we're isolating them into a section of the market. Which would then rationalize – there's other pockets of the market that other companies would go through. I've been kind of back and forth on this weird bandwagon, because I'm not sure how much I really believe Jensen when he says it is specific-purpose. Because to him, the GPU is the best architecture for everything, because it's massively parallel, as you said.

So I’m kind of like, “Look, dude, I was in the room in 2016 when you outlined the strategy to me and it made a ton of sense. It's brilliant. That makes a ton of sense. And now you're like, nah, it's not general purpose anymore?” But it really is. I still think he wants it to be general purpose, but that negates the market.

What are thoughts on that? Because I do wrestle with this. Is it truly general purpose or should we think of their architecture as specialized?

AL: Yeah, yeah, that's a good question. And it opens up a lot of other interesting questions.

If you look at their chip – if you look at the H100 – I would argue that it's general purpose.

Yes, it does have tensor cores that are great for matrix multiplication, but there's other compute on that chip, taking up chip area, taking up power consumption, adding control logic, adding latency. There are things called a texture unit – that's still sort of a remnant of the hardware for graphics-related things. You can definitely still run workloads that don't even use the Tensor Core. So it is still general purpose.

The H100 is tuned to take what Nvidia has and be great at inference and training, but there's still bloat that you could cut out and create a more custom version of this.

This raises a question then – is there a world where Nvidia has general purpose GPUs and then they also upend themselves and create their own even more finely-tuned ASIC?

BB: Hmm. I think this is a relevant point because of GTC next week. I'm imagining more of this will come out in their news next week that sheds light on this philosophy.

JG: I just want to chime in. Everybody should go and subscribe to Austin's blog, Chipstrat. He had a really, really fascinating post looking at “GPU bloat” as he defines it – all the other stuff on a GPU that's not actually doing matrix math. And it was, what was it? It was like 96% of the chip is not doing matrix math. Did I get those numbers right?

AL: Yeah, you can do some napkin math. And so please correct me where I'm wrong, listeners. But you can do some napkin math to show that there's a surprising amount of chip area that is not dedicated to matrix multiplication.

Again, on paper, you would expect that if this chip is dedicated to AI inference, it would have the highest compute density possible. And it would have memory access patterns and memory hierarchies perfectly tuned for AI inference.

JG: And we're not seeing that in Nvidia today. As much as they're the leader, like their chip is very much not that today. I guess that defines, that's the whole rationale for accelerators in the first place. But it's still pretty staggering that here we have the leader in GPUs and AI compute and only a small portion of their chip is actually doing AI math.

AL: Yes, totally. It was shocking to me. I got the idea listening to a podcast with the Etched CEO Gavin, and he was talking to Patrick O'Shaughnessy. He made some of these claims, and I was just like – I don't believe it, so I'm going to do the math myself. And the deeper I dug into it, again, I'm making some assumptions as you can read about it, but I was really surprised as well.

That's where I thought – oh, I can see where startups are saying, “let's do this”. And investors are saying, “I'll bet on you”.

BB: Right. I think that's really the key element to me that gets interesting is, again – Jensen wants to talk about them being specialized. He used “accelerating”, but really it's not, right?

The GPU is massively parallel, open to every task, but as you talked about, it does have a rich legacy in very specific good things, generally being visual and heavy computational loads thrown at multiple cores. I think that's why this opportunity exists now again.

The other thing I'll say is if you follow the arc of time, mature markets – this is a general observation, whether that's PCs, TVs, cars – mature markets start out very general purpose. Things kind of feel similar. And then they become more specialized over time. Once the market really matures and into post maturity. So you could argue we’re in that moment now with semiconductors.

I'm not going to say it's an easy problem. Semis are extremely capital intensive. There’s a lot of software work that's incumbent and entrenched. If you just look at why X86 has been so dominant and why CUDA has been so dominant in these areas – there's just a lot invested into it, so it's hard to take those risks. But, one hundred percent of the time you're going to be able to do an analysis on a specific-purpose chip to solve a problem – the chip has one job, and it's gonna do that job better. Now, lots of challenges in getting the adoption of that chip. But at a basic level, you could prove that’s true. Doesn't mean it wins, but it makes a lot of sense.

I agree with you Austin, that's why I think we're seeing the startup opportunity. It makes a ton of sense on paper. Now beyond that, you know, another part of this discussion, in terms of “Can they succeed?” and “Where's the adoption?” But again, I think we all understand why this moment is happening.

AL: Yep, I fully agree with you.

Nvidia at the Center of the Universe

The mental model that I have for NVIDIA right now is that they are a massive body with a huge gravitational pull from the center of the accelerated computing space. And everyone's in their orbit. Everyone's orbiting them – or maybe some [companies] are a moon orbiting say AWS because they're renting H100s from AWS.

I think today there are a lot of reasons why everyone got sucked into Nvidia’s gravitational pull, whether it's the CUDA software that made it easy or because Nvidia was first to position themselves as GPGPU, or it's because they deliver on a full end to end experience – whether that's the DGX server or DGX cloud, or even Nvidia AI enterprise. They keep you sucked in there. And Nvidia is just getting more and more massive and has more [gravitational] pull.

But I think at some point, and probably coming soon, people are realizing, “hey, wait a minute, this [Nvidia GPU] isn't tuned to have the lowest latency or to have the lowest power.” For some of those companies in orbit – and maybe it's the small startups – something's going to be able to pop them out of Nvidia orbit and help them move to go orbit around a [Nvidia] competitor. Whether it's needing that lower latency or lower cost.

That’s why right now it is so interesting, especially with GTC coming up, to ask “can any competitor overcome that gravitational pull?”

JG: The mental model I use for this is to look at Arm in the data center. I think it applies. It's also a market I know up close and personal. And it took Arm-based CPUs a decade to get into the data center, give or take. And the big hurdle was software. All that software, all that infrastructural software, databases, and BIOS, and network management and on down the list, getting all of that ported over to not just ported and recompiled, but like fully optimized from x86 to Arm took a long time because the benefits weren't really clear.

You could have an Arm based CPU that was better – more performant than an Intel CPU. But if it's only 10 or 20% [more performant], customers would just say “oh, we don't want that because it's going to take us a hundred million dollars to port our software over. And so if we're going to do that, we need orders of magnitude better performance.” With physics and semiconductors, you can't get that kind of gain quickly.

I think what's happening now in the AI space is for most companies doing AI, except the hyperscalers aside, but most companies, they look at it and they say, we don't really know what our software needs are, and we definitely don't know what they're gonna be in two years. Let's just stick with Nvidia because we know that's the most flexible. We have the tools, everything's already written for it. We have our person doing CUDA and everything's optimized. Let's not mess with it. And some startups are going to come along and say, “Hey, we have 10X better performance.” For most enterprises, they're gonna be reluctant to switch because it's just not the compelling need there. We're at the stage now where you don't get fired for buying Nvidia.

BB: For training on Nvidia. Maybe inferencing is the question, but you don't get fired for training on Nvidia.

JG: For training, but I think it extends to inference as well, at least for the time being, because you need to move the model over. You have the software tools from NVIDIA that make that really, really easy. You don't have to do a lot of configuration. Again, for Google or Facebook to do that, it's easy. They have lots of people who do that. But if you're JP Morgan or Fidelity – big, but not hyperscale – it's a very tough argument to make to say, oh, there's startup ACME accelerator’s out there. Their chip looks good, but are we really going to invest what it's going to take to get this to work for our fairly finite scope of need?

I mean, do you think that changes? Or how does that change? What's gonna sort of force this bottleneck?

AL: I think you have a fair point, which is – again, people are just in orbit. It's going to take a lot of effort to leave the Nvidia orbit and it better be really compelling. The software – the good thing that a startup has going for them is if they constrain their competition to just inference and maybe to like a particular slice of software, let's say it's Etched and their transformer-based hardware. So they're looking at just transformer LLMs. They've scoped down all the software that they would have to support to just that use case.

Then they would have to also ask, “Can we write our software in such a way that people can easily move their workloads, or are familiar [with the software APIs]?” If someone's using PyTorch [TensorRT-LLM] or some open source library, could we support that library for inference and make it so easy to port your software that for the developer it's not “Oh man. I have to go learn this new programming paradigm. This is too much.”

But I still think for the CFO or someone who's going to be investing in this, you're going to have to show some metric that they really care about. And the tough part is if it's latency, it's hard to quantify, you know, I'm gonna create a way better user experience and our [AI] model is gonna respond in 100 milliseconds instead of two seconds. It's hard to quantify what that’s worth to your company.

BB: Yeah. I think the strongest position that Jensen's always taken – which goes to Jay's point about the struggle of Arm and the entrenchment of X86 and in CPUs – is he's always said you’ve got two things going for you when it comes to Nvidia. You have a large installed base – they're probably 90% of GPUs in the data center, however many millions of those there are. And then on PCs as GPUs. So he'd say, you've got a large install base of a hundred million plus. Data center is probably more like far, far less than that. But his whole point is you've got a large installed base and architecture compatibility. I do think those are strong arguments.

Now, again, the GPU has tons of trade-offs for inference. It still consumes a lot of energy. And if we believe five years from now most companies have gone through this training cycle – I guess one of my theses is, the few left needing that year-over-year screaming compute from Nvidia are going to be those who are chasing AGI. Everybody else is just like, “I've got plenty of compute for my training”.

Groq’s Approach

BB: That's where inference comes in. And again, whether or not the GPUs are best positioned for inference alone, I think is an interesting question. So, let's use that to lob into talking about Groq because Groq's gotten a lot of attention. Jay and I had a good meeting with Jonathan Ross, their CEO at CES, which is where Jay acquired the nice Groq hat that he's wearing, which I failed to pick up.

JG: Yeah, full disclosure, they gave me a hat, so I'm biased.

BB: I failed at acquiring swag exiting that room. Jonathan was like, take as much as you want because we have so much.

Groq is a super interesting company because to be honest with you, I didn't know a ton about them other than I've got some friends who invested and wanted to look into it. But, I assumed – as I'm sure perhaps many did – that they were actually selling an inference chip to data centers. What it turns out they're doing is they're actually hosting those themselves and they're providing inference-as-a-service. So you bring your APIs and you just perform inference on their products [Groq’s chips].

And their demo is amazing. It's exceptionally fast, not just with 7 billion parameter models, but things that are higher. It just screams. It's some insane number of hundreds of words of tokens a minute, which is a really good demo, especially when you think about inference.

Whether or not you want to truly outsource your inference to somebody else – in terms of costs.

And you probably saw Austin, cause I think you, you either commented on it or or tweeted – people were talking about the economics. They were like doing the economics of Groq and showing, you know, again, that's a trade off, right? How much do you want to outsource that versus either own or embed? I don't know the answer, but in terms of this point we're making, which is there's an opportunity for something specific-purpose to far exceed everything else when it comes to inference, Groq's proved that point, whether their business model works or not. I think it's proved you can offload inference when you're just talking about inference and actually have a very, very exceptional product experience.

AL: Yeah, totally. Groq has shown that low latency can open up really interesting use cases, really interesting user experiences. You can go see [for yourselves] – they tweet all the time with what people are building on Groq, and it's amazing. They're definitely showing that there is a better inference experience out there, or experiences powered by inference.

Now, their approach of inference-as-a-service makes a ton of sense because they've got to make money and they've got to prove that there are these awesome experiences that you can build on top of their chips. And so selling into data centers is a hard and slow process and it's going to take a while. So I think they're taking that right first step in building inference-as-a-service.

But Jay, what's your take? I can see your wheels spinning.

JG: So I think we have to separate out those two conversations. One is Groq's business model, which is essentially inference-as-a-service. That's a whole debate the industry had 10 years ago. AI may actually reshape that – platform-as-a-service. There were a whole bunch of platform-as-a-service companies trying to do different parts of different stacks. And those largely succumb to infrastructure-as-a-service companies like AWS.

Does AI shift that? Do we now have room for PaaS companies to do very specific cloud features? We can have that debate. That's a separate conversation.

The other piece of Groq is how good is their performance? We've seen the demos, it's impressive – is it a miracle or is there something else going on? There's some people who say it doesn't work well with larger models. I don't know. My point is just that the technical debates are still sort of going on. Those are two separate discussions.

I do think they have a very interesting architecture. From what I know about their chip design, it looks like a pretty pure play accelerator. There's not a lot of, as you would put it, bloat in there. And I think that it's super fascinating. There are a few companies that are coming up that are taking these similar approaches.

All of this, of course, leaves open the question of “are we done with software?” Are transformers the future? Because we could have some other wild card come out and upend all of this.

Just talking about Groq leads to all these other interesting conversations, none of which are settled. And I don't think any of them will be settled anytime soon.

BB: I totally agree.

Challenges for Legacy Companies

This hits on a broader topic that we've talked about before that I do think is interesting. We don't yet see, at least from Nvidia, Intel, AMD, we haven't yet seen specific purpose-built AI architectures. They're just using stuff that they used before and repurposing it.

Two years from now, Nvidia may come up with something that's like, “Here it is! We have, from the ground up, built an AI architecture for everything!” Or Intel might come out and say, “Here is our dedicated [chip]”. It takes time. But that's what we're seeing here.

I had come across – I don't know Austin if you've ever seen – IBM has this chip called North Pole, and it's essentially trying to deal with the von Neumann limitations. The way that they did this and they were testing it was they actually brought little bits of memory right next to each CPU core on the architecture. So it's right there, easy to be used, still able to be distributed. It's fascinating. I don't know what they're gonna do with this chip because it comes out of IBM research, and they just try to do things and see where they'll go.

But it backs up this point – when you start to rethink designing a chip or an architecture with AI workloads in mind, whether that's training or inference, you're going to take a different approach than you've taken today. And we're still scratching the surface of that. We haven't seen it. But I find that, again, really interesting, that you're just using legacy architectures to do something really new.

And then back to this point – that's where I think the diversity of a GPU comes in. It's just been very good at being able to transition from deep learning to graphic compute. It's just been very good at that. But that doesn't mean that there aren't a lot of opportunities for very, a new approach to architecture design going forward.

AL: I think with those legacy companies [Nvidia, Intel, AMD] that have legacy architectures – it's easiest to take their Lego blocks and try to piece them together differently to make something slightly better, slightly faster.

The question for those companies will be – will they be able to rethink an architecture from scratch and deliver on it? Or will they need to partner with a startup?

I'm secretly hoping for Intel to have some sort of play with Altera where they're taking advantage of reconfigurable computing and saying “Hey, could we make a model-based chip that uses reconfigurable computing to let us tune the chip [after manufacturing] in the event that architectures change or we need a bigger model size” or something.

I think there are still interesting dimensions that even the big companies can explore, especially, Xilinx, Altera – companies that partner with CPU manufacturers – but it will be an open question whether those big guys can innovate from the ground up.

JG: I endorse the sentiment. I don't necessarily agree with the FPGA part of it, but I endorse the sentiment. Because let's not get into the specifics of reconfigurable compute. There, we can disagree over a beer on that one. But I think your general topic is right.

There is lots of room for innovation here. I think some of the big companies might get it right. They're probably going to have to buy somebody, though, down the road, somebody who's nimble and can sort of go from a blank start.

Hyperscalers and Custom Chip Design

BB: The last topic I want to get into, because of Austin's technical background, I think this will be interesting.

There's something that we talk about a lot on this podcast, which is, semi-custom or custom chips. The debate I have with this – because I do struggle with this. I think strategically everyone understands using Apple as the gold standard here, why you might go vertical in your stack to make silicon.

Now, I also think lots of people are trying [making their own chips]. Lots of people will try. Not everyone should. And that's kind of the point that I'm getting to because if you look at a couple of hyperscalers who are doing these products. First off, they are not Nvidia. They are not Intel. They are not, you know, a number of companies, AMD, that just have done this really well.

They're not a chip design company, but that doesn't mean IP like from Arm or from others isn't very easy to start going and tuning these things. So I understand that.

I guess what the question I struggle with is why wouldn't one of these companies just not be able to keep pace with an Nvidia or an AMD or an Intel or Ampere. Etc.

What happens if you're actually technically inferior to the company you're trying to displace? And is [making your own chips] even a right strategy?

What's your take on them doing this? Is that sustainable? Is there really that risk that the incumbents who do this for a living just do a better job and make the custom efforts inferior that nobody wants to use them?

AL: That's a big question. Yeah. So I think hyperscalers should – and they are – design their own AI accelerators. We see that with Meta and AWS and Google. Google's had a lot of success there. It's only internal.

Should they? It's not necessarily their core competency.

Maybe they went down this path just because they had such a need to reduce their CapEx or get off of NVIDIA or something.

Is it sustainable? Will they always need to make their own chips? I don't know.

The first thought that comes to mind is maybe as these other startups come into existence or NVIDIA keeps innovating – if general purpose processors didn't meet their [hyperscaler] needs – total cost of ownership was too high, latency was too slow, whatever. Maybe these future innovations will actually start to meet their needs in such a way that they don't need to be doing semi-custom chip design.

BB: Yeah. Jay, have you formed an opinion on this yet? Cause I know we've talked about it on how big of a struggle this is.

JG: Yeah, I mean, we've talked about a lot of before, and lately I've been exploring this topic a lot of what's the boundary of who can do their own silicon and who shouldn't. For a while, it seemed like everybody was going to do their own silicon, and now I think we're starting to see companies reach the limits of that.

There are a couple big examples in China lately. Vivo, the handset maker, had a big, very high-profile chip team, Zeku, which they abandoned last year or they shut it down because they weren't getting the performance needs that they needed. And that actually caused a lot of other people in China to rethink their efforts because it's like, “oh, Vivo with this really hot shot, super high paid team can't do it. How can we?”

Now, since then, it looks like they've actually reconstituted part of the team. They've restarted the effort, but it's an ongoing debate.

For me, it's always comes down to: if you're gonna design your own chip, it has to be for a really good strategic reason. And it can't just be, “I'm gonna save a few bucks on Nvidia margin”. It has to be like, this chip makes me different. And we've seen that with Apple and with Google very clearly. Others, it's much more mixed.

BB: I'll just use this as an example. And I hate to pick on Amazon, but I'm going to. But when you listen to the Amazon chip team talk about this, the thing that they press on, the single value proposition that they press on is energy efficiency and sustainability. So they're like, this is our goal. Like you run these, it's gonna cost less energy. So I get it, right? Everybody trying to keep their energy costs down. So it makes sense.

But then you counter that with, okay, so if that's your biggest differentiator, like that's the thing you're hanging on, what's the risk that these other companies, architectures do get more efficient over time? And then what happens when that's not your angle anymore? Like what if they've caught you or surpassed you? Because that's really the one thing they're hanging on. Not that they train faster, I mean, it may cost less, but it's not gonna, or not that they inference any better. Like those aren't the things they're hanging on. They're hanging on the sustainability point.

And if you look at the arc of time, I should assume these things do get more efficient over time. It might take some time, but that's why I struggle with that. You hear them talk about why they did it. And you're kind of like, “I'm pretty sure Nvidia is going to do that”. Or I'm pretty sure AMD is too. So I struggle with that.

You know, that bit of it when I just feel like they've done this without really exploring roadmaps, I guess.

JG: I think that's actually what it comes down to is – 15 years ago, nobody needed to do their own silicon because there were twice as many semiconductor vendors and there's always one company out there who was vulnerable enough that a big customer could come along and say, hey, I want this on your roadmap. The industry is consolidated down. We only have a few hundred chip companies in the US. It's much harder to push around a Broadcom or Qualcomm than it was 20 years ago. And so companies are like, all right, we can't run the roadmap. We'll just do it ourselves.

And I think that's the dynamic that's taking place. But again, we're going to find the boundary. And for some companies, it's going to make a ton of sense. And it's going to work really, really well. And for others, merchant silicon is going to have to suffice.

BB: Yes, agree.

Amazon's Strategy

AL: On the AWS topic, I'm secretly hoping that they crunched the numbers and thought, “oh, if we make our own Tranium and Inferentia chips and dog food it internally with Amazon e-commerce, that maybe those workloads alone would help us save enough money to start to pay for this investment. And then we can turn around and sell it to our own customers because we are a hardware landlord.”

They’re renting this out, and so hopefully in the future – maybe they're betting on other customers who are also saying, “man, there's all this GPU bloat and we're paying, you know, whatever $20,000 a chip and wasting all that – we want something cheaper”.

So maybe AWS is betting that they could pay it down internally and then sell it to others. I don't know if that's true.

JG: Yeah, I agree. I'll take it a step further. I say that that's what they did with Graviton, the CPU. They dogfooded it themselves. They ran all their internal workloads on it. It turned out pretty well for them on that front. I think for AI stuff, though, it's much harder for them. They still don't know exactly what flavor of dogfood they need to eat. And it's much harder to pull that along.

And then on top of that there's performance differences, but without even getting into that, just the bare use case is not as clear.

BB: Yep. But I think Austin, you hit the exact nail on the head as to what Amazon's playbook has been. They're always their first best customer of something new. And then they either figure out that they've made money or can make money.

But again, to this point – this feels very different and outside of core competency and skill sets, especially to get into things like Tranium and Inferentia.

They've got a good team there. They're using Arm and they're using a lot of Arm IP. I think Arm is very smart about how they're doing CSS and Neoverse and evolving that for time to market and efficiency. So they can leverage great IP. I get that.

I just also like, I don't know, again, just to throw this out – this talk that Jensen did this week at Stanford, like he had this impassioned answer to the competition question, because he was like, “look, we don't just compete with other merchant silicon – I compete with my customers”. And so he even said, “so because of that, I'll show them my 10 year roadmap. So they know where I'm going”. It's not going to be a secret because I want to make sure that they know either maybe I don't fit your need or I solve all this going forward.” He's really passionate about just how much they're trying to accelerate and bring down costs. But I think he understands that point too, which is we get efficiencies over time. And so I just wonder – none of us know the answer – how many of these hyperscalers or big companies that do vertical, back off those [custom chips] because they found that the merchant solutions actually do everything that was in their initial goals. And then those projects kind of go by the wayside. I don't know, but that's why I throw this topic out, because it's interesting.

AL: Maybe in a year from now, two years from now, we'll look back and say you were right Ben.

BB: Yeah, it'll probably be longer than that. I don't think anybody's giving up in that timeframe, but we'll see. But anyway, this was good. This was a good, rich discussion. We really appreciate your time in joining our conversation on semis Austin. It's been good and we appreciate it.

AL: Yeah, definitely. Thanks for having me, guys.

BB: Yeah. And thanks for listening everybody. Like, subscribe, follow Austin on Twitter.

I appreciate his handle because I wanted to do this and I wasn't bold enough, but he is the Austin Lyons on Twitter. And Chipstrat on Substack. So find him there. And we will talk to you all next week. Thanks again.

JG: Thank you, everybody.