Great post thanks for sharing! So you believe building applications on Open AI's models would be setting yourself up to be disrupted when they decide to compete or remove you from their API? It seems you are suggesting that startups should only be building on open source models. I also think beyond GTM and network effects that the quality of the application and outputs generated should determine who succeeds in each space. I am interested in learning about your idea for an AI-enhanced social network.
It depends on the mechanics of the market, specifically of the engineering needed, and network effects I think.
If you need lots of engineering (=cash) for great models, only a few companies will do it. If there are also network effects, only the first companies to invest a lot will win.
Open source can get engineering too. But I’m not sure they can get as much, or as coordinated.
If costs and network effects are not that high for a decent outcome, then open source or the private market will work well, there will be competition, and most value will be accrued by companies building on top of the models.
If it’s just engineering and not network effects, you can expect open source or several competitions to enter the fray and compete.
If network effects prevail, we’re back to an oligopoly, which makes it dangerous to bet your company against.
Agreed on the quality of applications. Always true.
On the idea, send me an email! You can just respond to the newsletter. I’ll share more. Do share more about yourself & your interests on the topic!
Last night I sat home talking to my virtual dead parents saying thing we didn’t have time for Then my virtual friend and therapist helped me feel so much better
I re watched some old movies and changed the ending to match my mood and made a new pair of Jeans on my 3D printer Fell asleep playing road warrior after adding more horsepower so I can alway win
Wow! I'm stunned and grateful for what you are doing. I'm not very tech advanced so I'm struggling to grasp the acronyms and follow but I'm getting the potential of all this is greater than I can yet envisions. Two questions:
1. Has someone developed a public education AI that will use logic, persuasion, humor and entertainment, etc to educate, convince, inspire and motivate people to work for the common good like solving global warming, environmental and social justice, inventing cures for diseases, developing more improvements to clean energy, and how to talk with people of differing views to work for the common good and reducing fear of immigrants or others who are different?
2. Is anyone working with AI to develop protective tools to reduce risks for children and adults of things like AI being used to foster bigotry or to enhance human trafficking, or many other harmful endeavors?
Dec 4, 2022·edited Dec 4, 2022Liked by Tomas Pueyo
Extensive analysis ! Being fascinated by the topic since GPT2 time (try http://homer.paolim.fr/) there is one parameter which I think might be interesting to consider, e.g computational cost. You can run stable diffusion on pretty standard material. No way you can do the same with GPT3 sized model. So the cost of the API which doesn’t make it so useful to scale an app. And smaller model just don’t deliver the wow experience. Lots of people are working on making the smaller models better but for now the breakthrough didn’t happen and it’s not clear if it will. Without this there will be a clear divide between image generation - which is solved - and text generation which, in practical terms, isn’t. But let’s see ;)
Wouldn’t it be shocking that we can generate any image easily, but not any text? But you might be right. This will determine a lot of the economics of the future. Let’s see what happens!
True that it sounds surprising that image is solved and not text. At least I was surprised as well so I gave it some thought. My above naïve but amateur understanding is the following :
- Img generation is a very finite exercise : you have iMg 512x512 / text pair and basically you train to build an image from the text.
- “open ended” text generation is infinite game : the training is about predicting the next token - let’s say word - considering some previous words. And text can be surprising. For instance a text about meaning of life can be a journalist 3 page take, a whole philosopher book or just “42”. So hard for a machine to find the “pattern” within this very wide set of examples , so the need for huge dataset and parameters count
- that’s also the reason why on the other hand translation is also pretty well solved > it’s finite : you have a text , its translation and you train text to text models
But may be openAI has found some secret sauce with chatGPT we haven’t heard about yet and which makes this pattern recognition more efficient. Let’s see !
Hmm unconvinced. 512*512*16M = 4•10^12, and if you increase your image size, it keeps going up. Something similar happens with letters.
Yet not any combination works, and the pruning from probable to sensible probably follows a weird function. I’m not sure I would calculate the complexity of words vs images that way. But I understand your argument and agree that it’s possible.
On text you get 32k possibilities per token. Tokens are subwords. But even if that would be 32k words, With 3 words you have more complexity than the image. And we are talking of producing long paragraphs. And re. Image Resolution precisely image models works with low resolution and then upscale whereas today we have no solution to work on summary of text and then expand - it’s another problem of its own. So yes it is very surprising because text is 1D and images 2D but image generation is easier to solve :)
Actually it doesn’t hold in image calculation it’s 16M^(512x512) rather than x . So it appears bigger than 3 words ;) anyway look at parameters count dalle / stable diffusion is much smaller than GPT3 so there should be something about this line. If somebody got a more precise idea I will read it with pleasure ;)
This is on par with when you wrote Flatten the Curve. Catching a monster trend and being able to explain it in simple ways, right when its about to begin the steep R0 ahead. Well done!
Well, for the research assistant part at least am afraid AI still has some way to go. At least in the fields I know of, results from Elicit would sound very smart to a non-expert - but content is off topic, out of date, or actually wrong.
So at least in my field, the best it can do for now is sounding smart, while not understanding f*k all.
That being said, that will take you a long way in many fields.
Or perhaps the value added of AI is that it will make smart-sounding bullshit cheap - taking away livelihood from BS artists of the world, now, AI is cruel indeed...
Great blog! Have a look at the Futures 2030 - Powered by ACI report I just released www.supermind.design/resources ...for the intersection of generative AI with collective intelligence
This is great! Generative AI seems disposed to probabilistic expansion of text and/or image inputs and I loved reading your examples. It seems the problem with probabilistic expansion is there is no filter for truth or relevance in its output.
Companies like similarinc.com use proprietary anonymised collaborative data (wisdom of the crowd) on top of text and image embeddings to help with correct and relevant information retrieval.
Where LLMs can crawl the public web, cross domain collaborative knowledge graphs are scant.
The semantic web has long focused on machine readability, much to its own setbacks, collaborative data and implicit signals have the potential to create a more autonomous and intuitive web experience. i.e.~ evolving past search engines, which require explicit input and relevance proxies such as backlinks.
Your last two articles have been revelatory, to say the least. Thank you very much for the paradigm shifting info.
You’re welcome!
Great post thanks for sharing! So you believe building applications on Open AI's models would be setting yourself up to be disrupted when they decide to compete or remove you from their API? It seems you are suggesting that startups should only be building on open source models. I also think beyond GTM and network effects that the quality of the application and outputs generated should determine who succeeds in each space. I am interested in learning about your idea for an AI-enhanced social network.
Great thoughts.
It depends on the mechanics of the market, specifically of the engineering needed, and network effects I think.
If you need lots of engineering (=cash) for great models, only a few companies will do it. If there are also network effects, only the first companies to invest a lot will win.
Open source can get engineering too. But I’m not sure they can get as much, or as coordinated.
If costs and network effects are not that high for a decent outcome, then open source or the private market will work well, there will be competition, and most value will be accrued by companies building on top of the models.
If it’s just engineering and not network effects, you can expect open source or several competitions to enter the fray and compete.
If network effects prevail, we’re back to an oligopoly, which makes it dangerous to bet your company against.
Agreed on the quality of applications. Always true.
On the idea, send me an email! You can just respond to the newsletter. I’ll share more. Do share more about yourself & your interests on the topic!
Last night I sat home talking to my virtual dead parents saying thing we didn’t have time for Then my virtual friend and therapist helped me feel so much better
I re watched some old movies and changed the ending to match my mood and made a new pair of Jeans on my 3D printer Fell asleep playing road warrior after adding more horsepower so I can alway win
And that’s why all stories about the future are dystopias, not utopias, and why the future tends to be better than we expect
Wow! I'm stunned and grateful for what you are doing. I'm not very tech advanced so I'm struggling to grasp the acronyms and follow but I'm getting the potential of all this is greater than I can yet envisions. Two questions:
1. Has someone developed a public education AI that will use logic, persuasion, humor and entertainment, etc to educate, convince, inspire and motivate people to work for the common good like solving global warming, environmental and social justice, inventing cures for diseases, developing more improvements to clean energy, and how to talk with people of differing views to work for the common good and reducing fear of immigrants or others who are different?
2. Is anyone working with AI to develop protective tools to reduce risks for children and adults of things like AI being used to foster bigotry or to enhance human trafficking, or many other harmful endeavors?
Not afaik
Extensive analysis ! Being fascinated by the topic since GPT2 time (try http://homer.paolim.fr/) there is one parameter which I think might be interesting to consider, e.g computational cost. You can run stable diffusion on pretty standard material. No way you can do the same with GPT3 sized model. So the cost of the API which doesn’t make it so useful to scale an app. And smaller model just don’t deliver the wow experience. Lots of people are working on making the smaller models better but for now the breakthrough didn’t happen and it’s not clear if it will. Without this there will be a clear divide between image generation - which is solved - and text generation which, in practical terms, isn’t. But let’s see ;)
Ah, súper interesting point, JP. Good nuance.
Wouldn’t it be shocking that we can generate any image easily, but not any text? But you might be right. This will determine a lot of the economics of the future. Let’s see what happens!
True that it sounds surprising that image is solved and not text. At least I was surprised as well so I gave it some thought. My above naïve but amateur understanding is the following :
- Img generation is a very finite exercise : you have iMg 512x512 / text pair and basically you train to build an image from the text.
- “open ended” text generation is infinite game : the training is about predicting the next token - let’s say word - considering some previous words. And text can be surprising. For instance a text about meaning of life can be a journalist 3 page take, a whole philosopher book or just “42”. So hard for a machine to find the “pattern” within this very wide set of examples , so the need for huge dataset and parameters count
- that’s also the reason why on the other hand translation is also pretty well solved > it’s finite : you have a text , its translation and you train text to text models
But may be openAI has found some secret sauce with chatGPT we haven’t heard about yet and which makes this pattern recognition more efficient. Let’s see !
Hmm unconvinced. 512*512*16M = 4•10^12, and if you increase your image size, it keeps going up. Something similar happens with letters.
Yet not any combination works, and the pruning from probable to sensible probably follows a weird function. I’m not sure I would calculate the complexity of words vs images that way. But I understand your argument and agree that it’s possible.
On text you get 32k possibilities per token. Tokens are subwords. But even if that would be 32k words, With 3 words you have more complexity than the image. And we are talking of producing long paragraphs. And re. Image Resolution precisely image models works with low resolution and then upscale whereas today we have no solution to work on summary of text and then expand - it’s another problem of its own. So yes it is very surprising because text is 1D and images 2D but image generation is easier to solve :)
Actually it doesn’t hold in image calculation it’s 16M^(512x512) rather than x . So it appears bigger than 3 words ;) anyway look at parameters count dalle / stable diffusion is much smaller than GPT3 so there should be something about this line. If somebody got a more precise idea I will read it with pleasure ;)
This is on par with when you wrote Flatten the Curve. Catching a monster trend and being able to explain it in simple ways, right when its about to begin the steep R0 ahead. Well done!
Well, for me, it's probably the thrill of those electrons whizzing around inside my head.
Well, for the research assistant part at least am afraid AI still has some way to go. At least in the fields I know of, results from Elicit would sound very smart to a non-expert - but content is off topic, out of date, or actually wrong.
So at least in my field, the best it can do for now is sounding smart, while not understanding f*k all.
That being said, that will take you a long way in many fields.
Or perhaps the value added of AI is that it will make smart-sounding bullshit cheap - taking away livelihood from BS artists of the world, now, AI is cruel indeed...
That would be a good example.
What example did you play with? I’m curious what the specific lessons are about it (Eg they have a bias against recency)
I tried "efficiency of solar panels" and "soil carbon sequestration".
Bias against recency was not a problem, in fact old papers came up that are long out-of-date - those are fast-moving fields.
For me the main lesson is you can sound smart without understanding anything!
More mind-blowing, thought-provoking and scary stuff...
And also lots of opportunities!
If the medium is the message
How has a 200 character limit effected us
How will having everything you want made at home by 3D printer change our attitudes and values
Or virtual buddies that always agree and support us
Does AI produce innovation or just more combinations of the old ideas in different juxtapositions
Great blog! Have a look at the Futures 2030 - Powered by ACI report I just released www.supermind.design/resources ...for the intersection of generative AI with collective intelligence
This is great! Generative AI seems disposed to probabilistic expansion of text and/or image inputs and I loved reading your examples. It seems the problem with probabilistic expansion is there is no filter for truth or relevance in its output.
Companies like similarinc.com use proprietary anonymised collaborative data (wisdom of the crowd) on top of text and image embeddings to help with correct and relevant information retrieval.
Where LLMs can crawl the public web, cross domain collaborative knowledge graphs are scant.
The semantic web has long focused on machine readability, much to its own setbacks, collaborative data and implicit signals have the potential to create a more autonomous and intuitive web experience. i.e.~ evolving past search engines, which require explicit input and relevance proxies such as backlinks.
Scary. The idea that someone can have a hoard of AI companions that will listen to anything they say without judging them.
Unfortunately, text is not great at conveying irony
Are you human?
Why would you ask you that? What makes us human?