World Labs' Fei-Fei Li on Creating Large World Models

Bloomberg Live

Emily Chang & Fei-Fei Li

World Labs’ Spatial Intelligence Bet, Large World Models, Humanoid Robotics, AI Safety, and the Future of Education

Fei-Fei Li: The case for us is a 500 million year story — that animal intelligence starts with seeing and moving in the physical world.

The Bet on Spatial Intelligence

Emily Chang: Everyone is focused on LLMs — ChatGPT, Claude, large language models. But you have raised $1 billion to build something different: large world models. Make the case for us. What is the bet you are making that others aren’t?

Fei-Fei Li: So this is my co-founded startup, World Labs, and we are all in on spatial intelligence. And the means to spatial intelligence is building a large world model. So what is the case for us? The case for us is a 500 million year story — that animal intelligence starts with seeing and moving in the physical world. Evolution began with us as animals, knowing what the world is, knowing who we are, knowing how to move around it, interact with it. And much of life — human life, human work life, human private life — has a lot to do with perceiving, understanding, reasoning, interaction with the world, including the imaginary world of creativity, of productivity, as virtual worlds.

So unlocking that capability in machines, unlocking the capability of generating alien 3D, 4D worlds, unlocking the capability of reasoning within any world, unlocking the capability of teaching agents or robots, or assisting humans to interact with the world, is what spatial intelligence is about. And that’s what we are focusing on.

Emily Chang: So what can world models do ultimately that LLMs will never be able to? Kill words, put down fires with words, cook an omelet?

Fei-Fei Li: I think there’s so much. So for example, creativity. Whether we’re designing interior space, we’re designing machines, we’re designing homes, we’re designing stories — so much of that is beyond words. We also use agents — whether we use agents in a virtual world, whether it’s for entertainment like gaming or for more serious industrial applications, whether it’s digital twin design or inspection or optimization, what kind of optimization tasks. Or we build robots to help us to do a lot of things, from putting down fire to helping healthcare scenarios to manufacturing. All those are downstream applications of unlocking spatial intelligence and building world models.

The ChatGPT Moment for World Models

Emily Chang: So what do you think the ChatGPT moment for world models will be like? How will we know this has arrived?

Fei-Fei Li: That’s a great question, Emily, because chat is such a consumer behavior that the ChatGPT moment tends to be used to describe a viral, public consumer moment. I’m still trying to figure out if there is a corresponding consumer moment, because the kind of applications we are talking about tend to first go into the professionals — professional creators, professional designers, professional developers, professional researchers and engineers who use it for robotics and industrial design and all that. So maybe we will not necessarily have a consumer moment, but maybe we will. And you know, I would love to design my home in a much easier way and just change the color of the curtains with a click.

A Functional Taxonomy: Renderers, Planners, Simulators

Emily Chang: All right, that sounds pretty cool. So in the last six months, Jonathan Lasso mapped out to work on world models. Google shipped Project Genie. Nvidia has its own world models, Cosmos — Nvidia is also one of your investors. What do you have that they don’t? And which competitors out there are you watching the most?

Fei-Fei Li: So first of all, we started World Labs in 2024. I still remember when we were out talking about world models and spatial intelligence — it was just a year ago when people were still totally talking about LLMs. And so we really had a head start in understanding that this is going to be the next frontier. I am very excited by that. So what do they have that we don’t? First of all, I think we have an incredible team. We have the conviction. They don’t have the godmother, that’s for sure. But the world is big, and I think this is just like LLMs — I think there will be many companies doing incredible work in world models.

Just as 24 hours ago, we kind of got fed up that the word “world model” has been so confusing and being used in so many different ways that we actually put out a blog just explaining what a functional taxonomy of world model is, instead of mushing everything together. And the way I see it is right now, there are three ways of calling world models when it comes to spatial intelligence.

One is what I call a renderer, when the model puts beautiful pixels on the screen — mostly like video generation models — and the consumer is mostly human eyeballs. While the model commits to beautiful pixels on the screen, it doesn’t necessarily commit to physics and dynamics and geometric correctness, because that’s just consuming human eyeballs, not necessarily for computation and other tasks.

Then another kind of world model is what we call a planner, that is more for machines, more for robots, where it outputs — whatever the input is, the state of the world or the action — it outputs a correct action to take for the next step. And you see that kind of world model a lot for robotics applications.

The third kind, which I think is the linchpin of the three, is a simulator. It actually is consumed by humans as well as machines. It’s trying to respect the structure, the physics and the dynamics of the world, and really simulate the 3D and 4D information of the world as well as the semantic information. And the simulator could become a renderer, the simulator could become a planner, but this layer is a huge critical path, in my opinion, to unlock spatial intelligence. And that’s what World Labs is working on.

Humanoids and the Hype Gap

Emily Chang: All of this rolls up into robotics. So I want to get your take on the field, and humanoids in particular. Funding for humanoids hit $6 billion, but they still can’t load my dishwasher as fast as I can. They still can’t go get my Amazon packages. Well, world models — will World Labs close the gap between hype and reality?

Fei-Fei Li: That’s a loaded question, Emily. First of all, that is my job. Yes, I get it. First of all, robotics is going to be one of the most important revolutions in human industrialization. $6 billion is too small, right? If you look at self-driving cars investment, if you look at language models investment, it took way more than $6 billion. I’m not saying we now — I think it will take time to invest, and it will also hopefully not take the hype, but take the thoughtfulness to invest in the right effort. And for example, unlocking world modeling and spatial intelligence and the simulation layer — all this is part of that important effort. Are we going to close the gap? I do believe World Labs is working on one of the most critical technologies in the field of physical intelligence. And obviously that’s the hope.

AI Safety, Theater, and Real Work

Emily Chang: You’ve been more measured on AI safety — skeptical of the doom narrative but also of heavy-handed regulation. When you look across the industry, where do you see real safety work versus safety theater? Is anyone getting it right?

Fei-Fei Li: So in general I’ve been just more measured on every rhetoric. Makes me very boring, to be honest. I think there’s just so much hype. Obviously we need to build the right technology. We need to guardrail the technology. Whether you use the word “responsible,” you use the word “safety,” you use the word “trustworthy” — building the right technology and product so that it can empower, enhance, augment humanity and not harm them is the goal of any work we do, whether it’s AI or not.

So where is it being done right? I really hope every company, every product that’s being built — that the people behind it are very mindful of that and are thinking about, you know, what data are we using? What system are we building? What evaluations are we conducting? What guardrails are we putting in? How do we communicate with our users and customers? How do we work with regulators so that when the rubber hits the road that we are being responsible? I do believe a lot of this work is happening. It’s not happening in the theater, to be honest.

For example, building pharmaceutical and healthcare industry — companies are incorporating AI. I literally just came from the hospital to come to your panel because I have a family member about to get a surgery in the next one hour or so, and I was just in her hospital looking at where AI is already being used and where AI could be used. And it’s already happening. Doctors are using AI to help them with charting. Radiologists are using AI to assist them reading the MRI and the CT scans. I do hope that we have more AI to help our nurses, to help family members. I got this long radiology report last night, and the first thing I did is send it to AI so that they can help me to explain it. So all this is happening. Safety measures are happening. But there needs to be more, in the right way, in a scientifically grounded way. And that’s the conversation that should be taking place instead of what you say, the theater.

Emily Chang: Well, thank you for coming, and I hope your person is okay. We all do.

The AI Hate Wave and Talking to Students

Emily Chang: The backlash is real. It’s being called the AI hate wave. I’m sure you’ve seen the video — former Google CEO Eric Schmidt getting booed at a college graduation. You spend a lot of time with students. What are they saying? And if they’re scared, are the fears justified?

Fei-Fei Li: Yeah, I do spend a lot of time with students. To be fair, my students are pretty privileged because they’re Stanford students. I think it’s even more important — and I try to do it myself — that we spend time with our teachers, with our nurses, with our parents, grandparents. And that’s actually something that I try to do. I try to talk to K-12 educators. I try to go to places and talk to people where they feel that they’re not part of the conversation.

And even Stanford students reflect some of this mixed sentiment. There is, in society, a sense of hope. There is also excitement. There is also confusion. There is also, simultaneously, a sense of dignity and agency when AI can help me do things that I couldn’t do before. And a sense of loss of dignity and agency — if AI is going to take my job. So I think the sentiment is mixed.

And I really want to point out, a lot of this sentiment happens when there is a vacuum of thoughtful public discourse. Right now, the oxygen, the air is all sucked into the polarized extreme of doomism or total utopian. And, well, hype takes all the oxygen in the room. That void brews the kind of anxiety. And it’s actually that void we really need to care about, because that’s where real people live. That’s where real people are seeking answers.

As a scientist and an educator and an entrepreneur, I’m on ground zero with students, with educators, with entrepreneurs. And I really do believe it’s one of my responsibilities to not hype and try to speak with both science and humility, and inspire people to recognize this is a technology that can truly empower a lot of our work and life, can truly help us have a better healthcare system, have better scientific discovery, have a better environment, better education, if we do the right thing.

AI and the Future of Education

Emily Chang: We’re both moms. We both have young teenagers. How do you think AI will change learning and the college experience?

Fei-Fei Li: AI must change learning. AI must change K-to-16 learning. I think this is one of the biggest opportunities for humanity in the next decade to come. What is the most precious resource of our entire world? It’s human capital. And when we have gotten a technology that can answer standardized tests — whether it’s Common Core kind of tests all the way to International Olympiad of math exams — well, AI can do better than the average human. It’s not about humans are bad. It’s about we need to change the education system. We need to change how we evaluate. We need to change the way we empower teachers to teach, to educate the next generation of students, where they can use these tools to be empowered and do things that we can never imagine.

Emily Chang: So do you think our kids will still learn?

Fei-Fei Li: Absolutely, if we teach them right. If the society prepares them right, they should not be — all of the kids today should not be scared of AI. They should feel the human agency to lead AI, to use AI in the right way, and to use AI to make the impact that they want to make for the world.

AGI, Apples, and What’s Shipping

Emily Chang: Anthropic CEO Dario Amodei has suggested AGI is 2 to 3 years out — we’ll get there by scaling the current paradigm. Demis Hassabis says we’re at the foothills of the singularity. You’ve said you don’t even engage with the term AGI. Are they wrong, or is the disagreement about what we’re calling the goal?

Fei-Fei Li: I don’t engage with the term AGI because the founding fathers of artificial intelligence as a scientific field had this dream of thinking and doing machines. That is a scientific quest. And that quest has been my lifelong career, and I am still on that quest. Now, I’m combining that scientific quest with making products that can make people’s life better. And that is the field called artificial intelligence. I’m okay — people call it whatever they want. They can call it an apple, that’s fine. I’m focusing on building a technology that can truly make a difference in people’s lives and work.

Emily Chang: What’s the one thing you’ll have shipped this year that we’ll be talking about next year?

Fei-Fei Li: I hope that we will be shipping a model of spatial intelligence that will inspire incredibly exciting product opportunities that people haven’t seen before.