Sutton's Big World Hypothesis

LLMs are getting smarter and smarter at coding and solving math problems. This is honestly quite frustrating, since we used to believe those things symbolized human intelligence. IMO gold medalists are likely general problem solvers with top-tier research ability — people capable of working on almost any intellectually demanding task. Watching AI surpass them stings in a particular way.

But similar things have happened before. When AI reached superhuman ability in chess and Go, people were rattled in the same way. Those games represented a certain kind of human intelligence. People believed professional chess and Go players were exceptionally smart — and they were right, in some sense. What AI revealed is that those domains, as rich as they are, were still narrow enough to be solved.

AI is changing our definition of smartness. Intelligence, it turns out, is not about being an IMO gold medalist. It’s more about navigating the complex world around us. Demis Hassabis was not a professional chess player. He’s not an IMO gold medalist either. Is he less intelligent than those who are? No — because he chose to spend his time on more important problems rather than overfitting on narrow ones.

Olympiads are, in a real sense, about overfitting. I grew up in China and have many friends with competition backgrounds. To reach the national level, many of them skipped most of regular school to study for olympiads. Schools have dedicated coaches and structured training pipelines. There are external summer camps that parents send their kids to. A Chinese kid who studies olympiad math eight hours a day will outperform a British kid who thinks about math problems occasionally. Does that mean Chinese IMO contestants are smarter? By the results, maybe. In reality, no. There’s always a tradeoff. You gain more math, you give up more of everything else.

A few weeks ago, Richard Sutton came to MIT to give a talk. He introduced a concept he called the Big World Hypothesis — the idea that the world is always larger and more complex than any dataset we can collect. He was talking about artificial intelligence, but the same principle applies to humans. We experience so little in our lifetimes compared to the data that Claude or ChatGPT trains on, yet we are remarkably capable. We are efficient learners in a way that current AI is not.

The point I want to make is that we all have limitations shaped by our experience. Many of my IMO friends believe AI will take over now that it has reached gold-medal performance. They built up a belief that IMO equals intelligence, and everything else is trivial. I’d argue that’s not quite right.

That said, I’m not trying to be an AI doomer. I strongly believe in AI for mathematics, full automation of coding, and eventually AI-driven R&D. The world will look very different in a few years. But there are real walls ahead that people underestimate.

The most concrete one is data. We’ve largely exhausted the internet as a training source, and synthetic data has fundamental limits — you can’t bootstrap genuine novelty from existing distributions. In science especially, ground truth comes from the physical world, and that’s slow and expensive regardless of how smart your model is. An AI that reasons faster doesn’t help if the lab takes six months to run a trial. Biology, materials science, physics — these fields are bottlenecked by experiment, not by data processing speed.

Architecture search runs into a related problem. You can ask an agent to try arbitrary new architectures, but training each one requires enormous compute, and without good theory to prune the search space, you’re mostly burning resources. More importantly, the most consequential architectural ideas — attention, residual connections, SSMs — didn’t come from search. They came from someone having an insight about the nature of the problem. That kind of theoretical intuition is hard to automate, because it requires understanding why something would work before you run it.

My best guess, then, is that AI reaches pro-human level at research, but not superhuman. And that distinction matters, because science has always advanced through remarkable ideas rather than incremental accumulation. Attention, CRISPR, relativity — these didn’t come from searching a space more thoroughly. They came from someone reconceptualizing the problem entirely. If those moments are sparse and out-of-distribution by nature, then “superhuman researcher” might be a category error. There’s no superhuman version of asking the right question, because we don’t even know what the right question is until someone asks it.

This connects to a point about mathematics more broadly. Pure math used to be deeply connected to the real world — it gave us linear algebra, probability, calculus, tools that changed everything. But as math problems become more abstract, the field drifts further from reality. AI’s recent progress — solving long-open problems like those in Erdős’s list — is just another signal that mathematics will not remain the last frontier, just like chess and Go before it. Mathematicians have fun solving open questions, but it increasingly resembles a very sophisticated game. The societal return diminishes.

What remains valuable, and hard for AI to replicate, is taste — the ability to ask a question nobody thought to ask. Genuinely novel ideas are out-of-distribution by definition, and transformers learn distributions. When training data contains very few examples of a truly new idea, the model has little to go on. Incremental work will be automated away quickly. But directed intuition about which problems matter is still, for now, a human edge.

Though I’ll admit the honest answer is that nobody knows for certain. If AI reaches pro-human research level and you run a thousand instances in parallel, maybe the sheer coverage is enough to surface remarkable ideas more often just by exploring more of the space. Whether those ideas are dense enough to be found by search, or whether they require the kind of intuition that can’t be parallelized — that’s the question I keep coming back to, and I don’t think anyone has a satisfying answer yet.

Enjoy Reading This Article?