人工智能先驱扬·勒丘恩认为当今的大型语言模型即将过时

人工智能先驱扬·勒丘恩认为当今的大型语言模型即将过时
Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete

原始链接: https://www.newsweek.com/ai-impact-interview-yann-lecun-artificial-intelligence-2054237

Meta首席人工智能科学家、人工智能先驱Yann LeCun认为，像ChatGPT这样的当前大型语言模型（LLM）将在五年内基本过时。他认为这些模型缺乏对世界的真正理解，其运作空间有限，不像人类那样利用内部世界模型进行推理和规划。他批评LLM掌握了基于语言的技能，但却缺乏感知和物理交互等基础人类能力。 LeCun倡导一种新方法：联合嵌入预测架构（JEPA），该架构旨在基于视觉输入创建物理世界的表征。JEPA专注于在抽象表征空间中的预测，使系统能够学习世界如何运作并规划行动以实现目标。他认为，从基于语言的预测转向世界建模对于实现人类水平的智能至关重要。尽管对LLM持怀疑态度，但LeCun对人工智能的变革潜力持乐观态度，将其比作印刷术。他设想了一个人工智能增强人类智能的“机器社会”的未来，由人类管理并受到内置防护措施的约束。他驳斥了人工智能存在风险的担忧，强调开源人工智能开发对于确保其负责任使用至关重要。

近期，人工智能先驱Yann LeCun在一场演讲中（并在网络上引发讨论）表示，当今的大型语言模型（LLMs）即将过时。他认为自回归模型存在内在局限性。一位评论者反驳LeCun的观点，认为人类推理并非纯粹的数学运算，由此引发了关于数学在人工智能中作用的辩论。另一位批评者则指责LeCun最近只是夸夸其谈，缺乏实际行动，并提及了他过去在网上的互动。一些用户讨论了LeCun关于LLMs依赖于大脑快速直觉模式“系统1”思维，而缺乏“系统2”推理的论点。一位用户认为LLMs或许在模拟系统2，但却缺乏负责情绪和直觉的“爬行动物脑”，并认为这些情绪可能引导系统2的处理过程。还有人推测，全新的方法有可能在五年内超越LLMs。

Ask Yann LeCun—Meta's chief AI scientist, Turing Award winner, NYU data scientist and one of the pioneers of artificial intelligence—about the future of large language models (LLMs) like OpenAI's ChatGPT, Google's Gemini, Meta's Llama and Anthropic's Claude, and his answer might startle you: He believes LLMs will be largely obsolete within five years.

"The path that my colleagues and I are on at [Facebook AI Research] and NYU, if we can make this work within three to five years, we'll have a much better paradigm for systems that can reason and plan," LeCun explains in the latest installment in Newsweek's AI Impact interview series with Marcus Weldon, describing his team's recent work on their Joint Embedding Predictive Architecture (JEPA). He hopes this approach will make current LLM-based approaches to AI outdated, as these new systems will include genuine representations of the world and, he says, be "controllable in the sense that you can give them goals, and by construction, the only thing they can do is accomplish those goals."

His belief is so strong that, at a conference last year, he advised young developers, "Don't work on LLMs. [These models are] in the hands of large companies, there's nothing you can bring to the table. You should work on next-gen AI systems that lift the limitations of LLMs."

The paradox is striking: One of the principal architects behind today's AI boom is also one of its most notable skeptics. While companies race to deploy ever more sophisticated conversational agents and investors pour billions into large language model startups and the data centers to power them, LeCun remains unimpressed by what many consider the cutting edge of artificial intelligence, despite his team producing one of the leading foundational models used today: Llama.

For LeCun, today's AI models—even those bearing his intellectual imprint—are relatively specialized tools operating in a simple, discrete space—language—while lacking any meaningful understanding of the physical world that humans and animals navigate with ease. LeCun's cautionary note aligns with Rodney Brooks' warning about what he calls "magical thinking" about AI, in which, as Brooks explained in an earlier conversation with Newsweek, we tend to anthropomorphize AI systems when they perform well in limited domains, incorrectly assuming broader competence.

There's ample reason to heed LeCun's clarion call: LeCun has spent decades pioneering the neural network technologies that underpin today's AI boom and is one of the "three musketeers of deep learning" alongside Geoffrey Hinton and Yoshua Bengio who were jointly honored with the Turing Award in 2018 for their contributions to the field.

Yann LeCun spoke with Newsweek as part of its AI Impact Interview series Yann LeCun spoke with Newsweek as part of its AI Impact Interview series Photo-illustration by Newsweek/Getty

Born in France in 1960, LeCun has been fascinated with artificial intelligence from an early age. He was just 9 years old in Paris when he first saw Stanley Kubrick's 2001: A Space Odyssey, an experience that would shape his career trajectory. "It had all the themes that I was fascinated by when I was a kid," LeCun recalls. "Space travel, AI, the emergence of human intelligence."

What struck young LeCun most profoundly was the idea that intelligence could be self-organizing—that complex behaviors might emerge from simple elements interacting with each other. This concept would become a guiding principle throughout his career, even as he encountered resistance from the academic establishment.

When LeCun was beginning his work in the 1980s, neural networks had fallen deeply out of favor in computer science. A 1969 book by Marvin Minsky and Seymour Papert had effectively killed research interest by highlighting the limitations of simple "perceptrons," some of the earliest neural networks first introduced in the 1950s, and the field of AI had shifted decisively toward symbolic and rule-based systems.

"You could not mention the word neural nets at the time. It was only 15 years after the death of the perceptron, and it was still seen as taboo in engineering, not just computer science," LeCun explains. "But the field was revived by people who didn't care about this history, or didn't know about it, and made the connection between some methods in statistical physics and theoretical neuroscience and neural nets, and it's Nobel Prize–winning work now."

During his Ph.D. work at Université Pierre et Marie Curie in the mid-1980s, LeCun made his first significant contribution to the field of deep learning by developing an early form of the now famous backpropagation algorithm. So-called "backprop" is the mathematical technique that allows neural networks to learn based on errors detected in their outputs, which are then "back-propagated" through the neural network to adjust the internal weights in order to achieve higher accuracy outputs. This method would go on to become fundamental to the training of virtually all modern neural networks, forming the learning backbone of everything from speech and image recognition systems to chatbots and autonomous driving systems.

After completing his doctorate in 1987, LeCun headed to the University of Toronto for a postdoctoral fellowship under Geoffrey Hinton. A year later, he joined Bell Labs, where he would make perhaps his most transformative contribution: the development of convolutional neural networks (CNNs). Inspired by the structure of the visual cortex in mammals, CNNs use specialized layers that scan across images to detect features—like edges, textures and shapes—regardless of where they appear in the visual field. This architecture dramatically improved computer vision by enabling machines to recognize patterns despite variations in position, scale or orientation.

His innovations at Bell Labs led to practical applications that quietly revolutionized everyday systems. The handwriting-recognition technology LeCun developed was deployed by the U.S. Post Office and banks, reading more than 10 percent of all checks in the United States in the late 1990s and early 2000s. Today, convolutional networks still form the foundation of modern computer vision, enabling everything from facial recognition and medical imaging analysis to autonomous vehicle perception and augmented reality.

After stints at AT&T Labs and NEC Research Institute, LeCun joined New York University in 2003, where he still serves as a Silver Professor. In 2013, Mark Zuckerberg recruited him to become the first director of Facebook's AI Research (FAIR) division, a role he has evolved into his current position as chief AI scientist at Meta.

Returning to the topic of the limitations of LLMs, LeCun explains, "An LLM produces one token after another. It goes through a fixed amount of computation to produce a token, and that's clearly System 1—it's reactive, right? There's no reasoning," a reference to Daniel Kahneman's influential framework that distinguishes between the human brain's fast, intuitive method of thinking (System 1) and the method of slower, more deliberative reasoning (System 2).

The limitations of this approach become clear when you consider what is known as Moravec's paradox—the observation by computer scientist and roboticist Hans Moravec in the late 1980s that it is comparatively easier to teach AI systems higher-order skills like playing chess or passing standardized tests than seemingly basic human capabilities like perception and movement. The reason, Moravec proposed, is that the skills derived from how a human body navigates the world are the product of billions of years of evolution and are so highly developed that they can be automated by humans, while neocortical-based reasoning skills came much later and require much more conscious cognitive effort to master. However, the reverse is true of machines. Simply put, we design machines to assist us in areas where we lack ability, such as physical strength or calculation.

The strange paradox of LLMs is that they have mastered the higher-order skills of language without learning any of the foundational human abilities. "We have these language systems that can pass the bar exam, can solve equations, compute integrals, but where is our domestic robot?" LeCun asks. "Where is a robot that's as good as a cat in the physical world? We don't think the tasks that a cat can accomplish are smart, but in fact, they are."

This gap exists because language, for all its complexity, operates in a relatively constrained domain compared to the messy, continuous real world. "Language, it turns out, is relatively simple because it has strong statistical properties," LeCun says. It is a low-dimensionality, discrete space that is "basically a serialized version of our thoughts."

And, most strikingly, LeCun points out that humans are capable of processing vastly more data than even our most data-hungry advanced AI systems. "A big LLM of today is trained on roughly 10 to the 14th power bytes of training data. It would take any of us 400,000 years to read our way through it." That sounds like a lot, but then he points out that humans are able to take in vastly larger amounts of visual data.

Consider a 4-year-old who has been awake for 16,000 hours, LeCun suggests. "The bandwidth of the optic nerve is about one megabyte per second, give or take. Multiply that by 16,000 hours, and that's about 10 to the 14th power in four years instead of 400,000." This gives rise to a critical inference: "That clearly tells you we're never going to get to human-level intelligence by just training on text. It's never going to happen," LeCun concludes.

When asked to define intelligence, LeCun is characteristically precise: "You could think of intelligence as two or three things. One is a collection of skills, but more importantly, an ability to acquire new skills quickly, possibly without any learning." He illustrates this with an everyday example: "You ask your 10-year-old, 'Can you clear the dinner table?' Even a 10-year-old who has never done it, or maybe only observed it being done a couple of times, has enough background knowledge about the world to be able to do that task the first time without training."

This ability to apply existing knowledge to novel situations represents a profound gap between today's AI systems and human cognition. "A 17-year-old can learn to drive a car in about 20 hours of practice, even less, largely without causing any accidents," LeCun muses. "And we have millions of hours of training data of people driving cars, but we still don't have self-driving cars. So that means we're missing something really, really big."

Like Brooks, who emphasizes the importance of embodiment and interaction with the physical world, LeCun sees intelligence as deeply connected to our ability to model and predict physical reality—something current language models simply cannot do. This perspective resonates with David Eagleman's description of how the brain constantly runs simulations based on its "world model," comparing predictions against sensory input.

For LeCun, the difference lies in our mental models—internal representations of how the world works that allow us to predict consequences and plan actions accordingly. Humans develop these models through observation and interaction with the physical world from infancy. A baby learns that unsupported objects fall (gravity) after about nine months; they gradually come to understand that objects continue to exist even when out of sight (object permanence). He observes that these models are arranged hierarchically, ranging from very low-level predictions about immediate physical interactions to high-level conceptual understandings that enable long-term planning.

LeCun offers an elegant example: "Let's say we're in New York today and decide to be in Paris tomorrow morning. We cannot plan our entire trip in terms of muscle control—it would be a completely intractable task. But at a very high level of abstraction, we can say, 'I need to go to the airport and catch a plane.' So, now I have a goal. How do I go to the airport? I'm in New York, so I go on the street and hail a taxi. Okay, how do I get on the street? Well, I have to stand up from my chair, take the elevator down, and ..."

This hierarchical planning relies on mental models that LLMs don't possess. While they can produce text that sounds reasonable, they lack grounding in physical reality and cannot reason about novel situations in the way that even a very young child can.

So, rather than continuing down the path of scaling up language models, LeCun is pioneering an alternative approach of Joint Embedding Predictive Architecture (JEPA) that aims to create representations of the physical world based on visual input. "The idea that you can train a system to understand how the world works by training it to predict what's going to happen in a video is a very old one," LeCun notes. "I've been working on this in some form for at least 20 years."

The fundamental insight behind JEPA is that prediction shouldn't happen in the space of raw sensory inputs but rather in an abstract representational space. When humans predict what will happen next, we don't mentally generate pixel-perfect images of the future—we think in terms of objects, their properties and how they might interact.

"If you do the naive thing, which I've done and many of my colleagues have tried to do, of training a big neural net to predict the next few frames in a video, it doesn't work very well. You get blurry predictions, because the system cannot exactly predict what's going to happen" pixel by pixel, LeCun explains.

But recent breakthroughs have made a different video-based approach viable. In one experiment called DINO World Model, researchers at Meta took a pre-trained encoder that had learned to extract features from images through self-supervised learning, then trained a predictor to anticipate how those features would change when certain actions were taken.

"You can then give it a task, which is to arrive at some target state, and by optimization, plan a sequence of actions so that your model predicts you're going to get to that goal," LeCun says. This enables the system to plan novel action sequences to achieve specified goals—a rudimentary form of reasoning and planning.

For another recent model called V-JEPA (Video-JEPA), LeCun's team trained a system to complete partially occluded videos. When shown videos where something physically impossible occurs—like an object changing shape spontaneously or disappearing when it should be visible—the system's prediction error spikes dramatically, indicating it has implicitly learned basic physical principles.

This approach differs fundamentally from how language models operate. Instead of probabilistically predicting the next token in a sequence, these systems learn to represent the world at multiple levels of abstraction and to predict how their representations will evolve under different conditions.

LeCun believes language models might still exist in the future, but they would serve a narrower purpose: "There's a small role for LLMs, which is basically turning abstract thoughts into language." He draws a neurological parallel: "In the human brain, this is done by Broca's area, which is right here," he says, pointing to a small region near his left temple. "It only popped up in the last couple hundred thousand years. If you lose the [function of] Broca's area, you can think, you just can't express your thoughts."

Despite his criticisms of today's AI systems—"We're not nearly close to reaching human-level intelligence. It's not going to happen tomorrow."—LeCun isn't a technological pessimist. On the contrary, he believes, "AI will have a similar transformative effect on society as the printing press had in the 15th century." But in his vision, the impact will be through amplifying human intelligence, not replacing it. "The nature of human work is going to change conceptually and qualitatively," he predicts. "I don't think it's going to be very different from what occurred with previous technological revolutions, where physical strength was replaced by machine strength, or some intellectual or office tasks were replaced by computers."

Where LeCun differs from many AI futurists—including his former mentor and joint Turing Award-winner Geoffrey Hinton—is in his assessment of existential risks. When Hinton retired from Google in 2023, he warned, "There's a serious danger that we'll get things smarter than us fairly soon and that these things might get bad motives and take control," adding, "This isn't just a science-fiction problem. This is a serious problem that's probably going to arrive fairly soon." Last December, Hinton estimated there was a 10 to 20 percent chance of current AI systems causing human extinction by 2030.

LeCun forcefully pushes back against such concerns. "That's completely false," he insists, "because, first of all, I think people give too much credit and power to pure intelligence." He pithily adds, "Looking at the political scene today, it's not clear that intelligence is actually such a major factor. It's not the smartest among us that tend to be the leaders or the chiefs."

LeCun's optimism stems partly from a pragmatic assessment of what AI systems can actually control in the physical world. While movie scenarios often portray AI run amok, commanding vast resources and taking control of critical infrastructure, LeCun points out that such capabilities would require not just intelligence but physical control and access that AI systems wouldn't have. He also believes AI systems are easily constrained. "The nice thing about an AI system is that you can design it in such a way that it cannot escape its guardrails. Humans can break laws because we have free will."

He also takes issue with the assumption that intelligence and domination are linked, noting that many of history's most brilliant minds—like Albert Einstein or Richard Feynman—were neither rich nor powerful. In his view, attributing too much power to intelligence alone overlooks other, potentially more dangerous human vulnerabilities: "We like to think that intelligence is everything as humans, but a virus can bring us down and they're not particularly smart."

He imagines a future where AI systems form a kind of self-regulating ecosystem: "It's going to be an interactive society of machines," he predicts. If one system misbehaves, he says, "you're going to have other AI systems that are smarter that can take it down. It's going to be like my smart AI police against your rogue AI."

Augmented Intelligence: Reflections on the Conversation with Yann LeCun

By Marcus Weldon, Newsweek Contributing Editor for AI and President Emeritus of Bell Labs

I am always impressed by the polymathematical insights produced by Yann LeCun. It is rare to find someone who has a meaningful level of knowledge and understanding about such a diversity of topics and is singularly unafraid to speak his mind. It is particularly refreshing for one of the most skilled and innovative AI practitioners to neither eulogize nor denigrate current technologies, but rather just put them in the appropriate context. There were five key themes that stand out for me, which I explore in more depth here:

Generative AI models are fundamentally limited as they cannot represent the continuous high-dimensional spaces that characterize nearly all aspects of our world
The future of AI cannot therefore be about scaling these inherently flawed models, but must be about building models that contain abstract representations of our world that can be probed, can predict and can plan
Human intelligence and, by extension, human-like machine intelligence is hierarchical and comprised of many levels and types and timescales, and we are currently far from being able to represent this rich tapestry of functionality and capability
Intelligence is not everything—it is certainly a key something, but it is less powerful than motivated physical, psychological or biological forces. So therefore, AI is not an existential threat in its own right.
The future will be comprised of a "Society of Machines" that have both System 1 and System 2 capabilities and amplify human capabilities. This will exist below us in a new human-machine societal hierarchy, as they are constrained to do our bidding by guardrails built into these systems.

These lessons complement and amplify those from my prior conversations with Rodney Brooks and David Eagleman, leading to a clear and consistent emergent picture of our AI-enriched future.

In this future, humans will shift into more managerial roles, using AI systems as tools rather than being replaced by them. "Everybody will become a CEO of some kind, or at least a manager," LeCun suggests. "We are going to see humanity step up in the hierarchy. We're going to have a level below us, which is going to be those AI systems." But critically, he clarifies, "they may be smarter than us, but they will do our bidding."

This vision of augmentation rather than replacement aligns with both Brooks' and Eagleman's perspectives. As Eagleman told Newsweek, "Right now, it's all about co-piloting, and we're moving to a future where there's going to be more and more autonomous systems that are just taking care of stuff."

For this future to materialize safely and equitably, LeCun strongly advocates for open-source development of AI technology. "Open source is necessary," he argues, because no country will "have AI sovereignty without open-source models, because they can build on top of it and establish their own sovereignty."

LeCun returns to the fundamental distinction between today's AI and the systems he believes will eventually replace them. Current language systems are trained, he says, "just to predict the next word in the text." To get these systems to be more proficient at sophisticated knowledge tasks, "then there is an increasingly expensive fine-tuning phase. So, you train them to answer particular types of questions, but you don't train them to invent new solutions to new problems they've never faced before."

He contrasts two approaches to programming: The System 1 approach is to generate statistically plausible code with AI and then test it repeatedly, making changes until it works. Of this latter method, Le Cun says, "It's expensive because it's test-time computation. It's exponential—n times more expensive because the tree of possibilities gets wide." Conversely, the human System 2 approach is more linear as it is comprised of a clear goal with code constructed to achieve that goal that, in the hands of an experienced coder, is more likely to be mostly correct with just a few bugs to fix.

Eliminating this exponential efficiency gap between current AI systems and the optimal solution is why LeCun believes that approaches focusing on world models and planning will ultimately supersede today's large language models, despite their impressive capabilities in narrow domains. "I've said multiple times that I'd be happy if, by the time I retire, we have systems that are as smart as a cat," LeCun says with a smile. "And retirement is coming fast, by the way, so I don't have much time!"

人工智能先驱扬·勒丘恩认为当今的大型语言模型即将过时 Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete

Augmented Intelligence: Reflections on the Conversation with Yann LeCun

人工智能先驱扬·勒丘恩认为当今的大型语言模型即将过时
Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete