The 'test loss' series of graphs say that more data is needed to get less mistakes. Today's models are basically trained on the whole internet. Any further data will be marginal (lower-quality). Generally speaking, we have reached the limit of high-quality data to feed the bots, and a large part of the internet right now is already LLM-generated. So we might run out of high-quality data before. Indeed there can be deliberate 'infection' of training data to 'propagandize' LLMs - (https://thebulletin.org/2025/03/russian-networks-flood-the-internet-with-propaganda-aiming-to-corrupt-ai-chatbots/).
The only other source of real data is reality itself, ie real-world data captured by robots in real-world environments, but that will probably require closer to a decade.
Exponential increases are exponential until they are not.... if there is an upper bound in AI intelligence, it results in an s-curve. I'm not claiming necessarily that there IS an upper bound, or where it could be - I don't think humanity even understands intelligence that much to be able to. I AM saying that all these hyperscalers seem to be assuming that there is no such upper bound (or else they are at least presenting it that way to investors)
Good and interesting points. Re your comment on our not understanding 'intelligence.' One lacuna that seems to be near-universal among AI researchers is not to realize that our intelligence is biological, that our brains are not in a glass jar in a lab, and that that metaphor of AI being like a brain's intelligence is severely limited....Our brains are an integral part of a bodywide system, and arise out of the evolution of a creature who has needs, desires, emotions. And therefore the neurochemical system of which the brain is a part can't be thought of as separate (either as a metaphor for computers or in themselves) from that body in its environment without hugely distorting and misunderstanding it. Obvious, I guess, but it seems often forgotten.
"When we reach that point, AIs will start taking over full jobs, accelerate the economy, create abundance where there was scarcity, and change society as we know it."
Whenever I hear about how 'we' are going to have this or that benefit for 'us', I think of that joke about the Lone Ranger and Tonto. They're surrounded by ten thousand Lakota and Cheyenne warriors. The Lone Ranger says: "Well, Tonto, it looks like we've reached the end of the trail. We're going to die." Tonto says: "Who's this 'we', white man?"
So who is the 'we'? Who gets the abundance? Who benefits from that changed society? Certainly, the owners of the systems will -- even those of us who own a small piece of those companies. But those who don't? What happens to them when their jobs disappear? Perhaps it's like horse jobs turning into car jobs 100 years ago -- but what if it's not?
Doing a quick AI search, I'm reminded that practical and widespread use of quantum computing is still 5-20 away and that even horizon has remained consistently in the "nearly there" category for a remarkably long time.
AI also says, we are years, possibly decades, away from fully autonomous (Level 5) self-driving cars being a common reality. Both of these breakthrough were widely thought, by the market and researchers to be achieved by this date. Anytime the word "God" is bandied about, we must make certain hubris is nearby.
"The data issue" will remain even if I can comb through it with great speed because, as my father used to remind me, "You can't make chicken salad from chicken shit."
I do, however, understand that there are many programming areas, like genome research, chemistry, medicine, weather prediction, traffic management where AI will shine.
Have a look at this. Really worth a watch for 30 minutes to hear the MIT Professor talk about a non-LLM, non-Generative kind of AI that is far better - lower power consumption, does not produce job losses. Augments, rather than replaces, human abilities: https://alum.mit.edu/forum/video-archive/ai-cheerleaders-unambitious
The 'test loss' series of graphs say that more data is needed to get less mistakes. Today's models are basically trained on the whole internet. Any further data will be marginal (lower-quality). Generally speaking, we have reached the limit of high-quality data to feed the bots, and a large part of the internet right now is already LLM-generated. So we might run out of high-quality data before. Indeed there can be deliberate 'infection' of training data to 'propagandize' LLMs - (https://thebulletin.org/2025/03/russian-networks-flood-the-internet-with-propaganda-aiming-to-corrupt-ai-chatbots/).
The only other source of real data is reality itself, ie real-world data captured by robots in real-world environments, but that will probably require closer to a decade.
Exponential increases are exponential until they are not.... if there is an upper bound in AI intelligence, it results in an s-curve. I'm not claiming necessarily that there IS an upper bound, or where it could be - I don't think humanity even understands intelligence that much to be able to. I AM saying that all these hyperscalers seem to be assuming that there is no such upper bound (or else they are at least presenting it that way to investors)
Good and interesting points. Re your comment on our not understanding 'intelligence.' One lacuna that seems to be near-universal among AI researchers is not to realize that our intelligence is biological, that our brains are not in a glass jar in a lab, and that that metaphor of AI being like a brain's intelligence is severely limited....Our brains are an integral part of a bodywide system, and arise out of the evolution of a creature who has needs, desires, emotions. And therefore the neurochemical system of which the brain is a part can't be thought of as separate (either as a metaphor for computers or in themselves) from that body in its environment without hugely distorting and misunderstanding it. Obvious, I guess, but it seems often forgotten.
Thanks, as always.
"When we reach that point, AIs will start taking over full jobs, accelerate the economy, create abundance where there was scarcity, and change society as we know it."
Whenever I hear about how 'we' are going to have this or that benefit for 'us', I think of that joke about the Lone Ranger and Tonto. They're surrounded by ten thousand Lakota and Cheyenne warriors. The Lone Ranger says: "Well, Tonto, it looks like we've reached the end of the trail. We're going to die." Tonto says: "Who's this 'we', white man?"
So who is the 'we'? Who gets the abundance? Who benefits from that changed society? Certainly, the owners of the systems will -- even those of us who own a small piece of those companies. But those who don't? What happens to them when their jobs disappear? Perhaps it's like horse jobs turning into car jobs 100 years ago -- but what if it's not?
Doing a quick AI search, I'm reminded that practical and widespread use of quantum computing is still 5-20 away and that even horizon has remained consistently in the "nearly there" category for a remarkably long time.
AI also says, we are years, possibly decades, away from fully autonomous (Level 5) self-driving cars being a common reality. Both of these breakthrough were widely thought, by the market and researchers to be achieved by this date. Anytime the word "God" is bandied about, we must make certain hubris is nearby.
"The data issue" will remain even if I can comb through it with great speed because, as my father used to remind me, "You can't make chicken salad from chicken shit."
I do, however, understand that there are many programming areas, like genome research, chemistry, medicine, weather prediction, traffic management where AI will shine.
Have a look at this. Really worth a watch for 30 minutes to hear the MIT Professor talk about a non-LLM, non-Generative kind of AI that is far better - lower power consumption, does not produce job losses. Augments, rather than replaces, human abilities: https://alum.mit.edu/forum/video-archive/ai-cheerleaders-unambitious