
AI: Core AI innovation reinforced with recognition. RTZ #649
All human knowledge is based on the knowledge of others before them, an idea that Isaac Newton pithily observed as ‘Standing on the Shoulders of Giants’ (aka OTSOG). It’s an idea I’ve expanded to this this AI Tech Wave, where increasingly all knowledge it being built on rapidly with new knowledge with massive AI Compute driven infrastructure, via a core technique known as ‘Reinforcement Learning’ (RL). Despite the ongoing ‘fair use’ copyright tussles.
It’s a core building block to better AI reasoning, agents, and onto AGI, or artificial general intelligence. Whatever or whenever it’s achieved.
RL, whether based on human feedback (RLHF) or RL without human supervision, has been the crux of this AI Tech Wave for years now (see chart and legend below).
And as I’ve noted before, got a re-boost with the recent DeepSeek innovations on using RL for ‘distillation’ and other ‘chain of thought’ AI reasoning techniques to tech new, sometimes smaller models more precise things, with larger LLM AIs.
So it’s great to see this core innovation itself getting recognition for its earlier pioneers, and receiving the Turing Award.
As Axios notes in “Pioneers of reinforcement learning named Turing award winners”:
“This year’s Turing Award — often called the Nobel Prize of computer science — is going to Andrew Barto and Richard Sutton, the pioneers of a key approach that underlies much of today’s artificial intelligence.”
“Why it matters: Reinforcement learning, as the technique is known, posits that computers can learn from their own experiences, using a system of rewards similar to how researchers have trained animals.”
It was a solitary journey through the wilderness, as many core innovations can be:
“In a joint interview, Barto and Sutton said the award is extremely rewarding, especially given that for much of their career, the technology they pursued was out of vogue.”
-
“When we started, it was extremely unfashionable to do what we were doing,” Barto told Axios. “It had been dismissed, actually, by many people.”
-
“There were periods of time when I could not get funding because I was not doing the current fashionable topic, and I wasn’t going to change to what was fashionable,” he said.”
-
“Sutton added that it was “particularly gratifying” to be given this award since it was Alan Turing who proposed the notion of computers learning from their own experiences in a 1950s paper, though it would take decades for there to be enough computing power to test out the notion.”
But the innovations were painstakingly built:
“Catch up quick: Sutton, now a computer science professor at Canada’s University of Alberta, was Barto’s student at the University of Massachusetts in the late 1970s.”
-
“Throughout the 1980s, the pair wrote a series of influential papers, culminating in their seminal 1998 textbook: “Reinforcement Learning: An Introduction,” which has been cited in more than 70,000 academic papers.”
-
“The approach finally gained prominence in the last decade as DeepMind’s AlphaGo began to defeat human players.”
-
“Reinforcement learning from human feedback is a key method for the training of large language models, while the approach has also proven useful in everything from programming robots to automating chip design.”
And now globally recognized:
“What they’re saying: Google’s Jeff Dean said reinforcement learning has been central to the advancement of modern AI.”
-
“The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments.”
-
“Google funds the $1 million prize given each year to the Turing Award winners.”
Like most such things, the task of building upon the core ideas, has just begun, especially as AI hardware and software improve at an exponential pace for the forseeable future. For now surrpassing Moore’s Law for a while.
“What’s next: Both Sutton and Barto believe that current fears about AI are overblown, though they acknowledge that highly intelligent systems could cause significant upheaval as society adjusts.”
-
“Sutton said he sees AGI as the chance to introduce new “minds” into the world without having them develop biologically, through evolution.”
-
“I think it’s a pivotal moment for our planet,” Sutton said.”
-
“Barto echoed that cautious optimism: “I think there’s a lot of opportunity for these systems to improve many aspects of our life and society, assuming sufficient caution is taken.”
Recent months have of course seen Nobel and other awards go to AI innovators and researchers. But it’s good to see this core, foundational AI innovation recognized in this core way.
As OpenAI founder/CEO recently noted, ‘Deep Learning’ worked’., as I discussed before. And it’s in no small part to RL. And it likely has far more things to be built upon going forward in this AI Tech Wave. It’s an exciting time indeed. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)