I am focusing on the upsides here. But like any new technology, there are also serious downsides. The biggest of all being potentially the singularity.
Honestly, this was as timely and impactful as what put you on the map - Flatten The Curve. No one has put together the full story of state of Gen AI, right at the moment it begins the steep R0.
WRT cinema and dubbing, the missing piece is to AI-edit the voices so that they (a) sound like the original actors and (b) feel like they happened during the actual action, instead of read off from the script in front of a microphone in a random sound studio.
Agreed. But do you think that’s impossible? It sounds quite possible to me to get to the 80-90% level of fidelity, which sounds like enough to me, since today dubbing is just spoken over the original version, which is terrible.
I see AI as a useful tool for humans, but still relatively primitive. What Roman soldier would plow a field in his military regalia? He would set that aside, and put on his grubby field working dress. AI can't figure that out. A human will have to program that into AI. I've recently read two articles about self-driving taxis in San Francisco. Both authors reported that after getting most of the way to their destinations, the AI got confused over something, pulled over and ended the trip. At least the humans lived to report it!
I’m the one who directed the Roman’s to be in military regalia to be fair. If I used the field working dress, they would have looked like normal farmers.
And you’re right on self-driving! It’s a much harder one than people though, like brick-laying.
Other fields are much easier, and even driving will be at some point automated.
Funny. Because I'm behind on some Uncharted Territories and never delete them from my inbox, I just read this article the day before reading "The Most Important Time in History Is Now".
One application I see with promise is that I volunteer with the local fire district Emerg. Med Tech's. (I have an Emerg Med. Responder license which is an assistant to EMT's) We have training sessions where we show up and the patient exhibits certain symptoms. We keep gathering data to narrow down what may be the medical cause of the symptoms. The hospital is 25 miles away on a narrow road. Lives may be saved if we can diagnose what is happening quickly. I can envision a portable AI medical assistant that can observe what we're doing and make relevant suggestions. Also some of our patients don't speak English or a language we know so that's another obvious help.
I was 15 when I saw first the Color TV, I was 23 when I knew about Satellite TV, I was 35 when I wrote my first email, and at 50, I bought my iPhone 4s. Now I'm 60, trying to fit into my seat belts.
The Internet changed Stock Image Industry within which I was involved. AI will kill (Transform) it.
… and then there's the downsides. Besides the obvious: Who, for one, decides which images are or are not appropriate? OpenAI requires that we "Do not attempt to create, upload, or share images that are not G-rated or cause harm". This sentence happens to include half of modern art, as well as a heap of other topics that may or may not be related to arousing prurient interest. It's quite telling that not being even PG-rated appears to be more important than, say, not generating a glorifying image of a suicide.
Excellent article and I'm only half-way through! Will be utilizing a lot from this, so thank you. However, I still think we have hard work to do on AI alignment.
Mind-blowing Tomas! I'd already seen some of this of course, particularly the image-related stuff. Short-term and rather mundanely the transcription software will help me a lot. Even as recently as a couple years ago I found human-captioning services worse than me just correcting AI-generating captions/transcriptions, and Adobe premiere has improved even more recently, but still needs a lot of proofreading/correcting. I film/edit a lot of medical industry videos, so if recognising technical language is getting more accurate that will save a huge amount of time. If Whisper or Descript are better... off to experiment...
Before writing the article, I was scared of getting into this because it seemed daunting. But plenty of tools are user-ready, and many others are engineer-ready. As you say, time to explore!
Descript you totally can. I couldn’t find a way to try whisper. Could you?
I haven't tried yet, I saved the Harry Ramsay link you included to go through with Whisper when I have more time - need to use Google Colab apparently and maybe it will finally make me dabble in coding... This article of yours itself I will have to go through a few more times! I opened several tabs with all the links and realised it was exponential in time required to read... :D
Great framing around diffusion curves and where we are on the adoption S-curve. One practical angle I’d add is evaluation: teams need lightweight, task-specific evals (clarity, factuality, brand voice) instead of generic benchmarks. For visual work, I keep a small gallery of “golden outputs” and regenerate against them; Createimg.ai (https://createimg.ai) helps me test variations across models to see which stays closest to spec. Would love a follow-up on measuring drift as models update.
one thing about the most “generative” part of the eukaryotic cell is that the mitochondria can actually survive cell death and migrate into other living cells. real generativity is amazing and we don’t have accommodations for it in the current corporate model of group collaboration and competition. Corporations scale and hillclimb — but socially we do generativity in other types of institutions…. so is it that generative ai should actually be done in other social institutions? or is it that corporations will evolve? well - we will end up trying all of the above over time and we will see what works. this is kaufman’s adjacent possible - these are new combinations and permutations.
logically, it may work to say that prescription glasses are more generative than any software so far - the glasses correct vision to enable one to see where-ever you look when you want to look there. whereas software as it is pursued in a corporate model is more like glasses designed to make your vision worse while rate monitoring and limiting and billing for light being allowed to reach your eyes and reserving the right to shut off the light entirely based on a set of legal conditions no user understands. to call the latter a generative structure is bizarre logically…. software can be generative - but not in the lock-in model and not in corporations which are hillclimbers targeting local maxima…. so any sort of actual “generative AI” which comes into existences as the IP of a corporation such that the generative AI is effectively “commanded” by a hillclimbing algo (corporations are hillclimbers - that legal responsibility to pursue profit means you can be sued if you don’t keep climbing whatever hill you start on)…. so there is a limiting structure (the corporation) which has been an amazing evolutionary adaptation and mechanism for group competition and cultural evolution… but and this is the leap because everything so far should be obvious to a mildly intelligent observer of the everyday… the problem your generative AI article really highlights is the need for new forms of group organization and competition to drive cultural evolution. it feels like there is something that grows out of distributed computing and networks and involves human and digital agents and is the functional successor to the corporation. this is being pasted with - but I don’t think it exists yet… crypto and daos are playing with this but it will have to be an evolution of the corporate form more directly than those are… corporations are still massively well adaptive and effective - but it feels like corporations are stuck at something like the bacteria equivalent (single cell organisms) in the biological world (bacteria are hillclimbers in a sense - feast and famine)… and the path to eukaryotes was unavoidably bizarre whatever theory you subscribe to…. hope that makes a little sense. generativity can exist is a domain governed and limited by a hillclimber - if may effervesce momentarily - but it’s a mirage. the hillclimber will turn anything generative into something else to serve the requirements of reaching the corporation’s local maxima. the problem isn’t the generative ai - it’s the current state of the corporate form. this is evolution at work - bottlenecks shift and advances can be temporary and end up not working. generativity itself is kind of like an energy source - so grafting generativity into corporations in a way could be like the impact of mitochondria - but then mitochondria has their own dna or their own “law”…. so maybe there’s a useful logic in that metaphor… don’t know. just keep hearing Inigo montoya in my head when people talk about generative ai and venture capital and companies…
i would just define generative differently such that generative ai sounds oxymoronic… like intrinsic value in a business context - any precise thinker knows that businesses only have instrumental value or value in use… but all these people talk about intrinsic value - which is logically non-sense when they mean some sort of normalized value…. so in the immortal words of Inigo Montoya from the Princess Bride, i am constantly hearing in my head, “I don’t think that word means what you think it means.”…. Generativity in the sense of generative social relationships or generative networks feels very different than generative AI… the proposition for the AI isn’t really to be a liberating structure but the business proposition is to appear to enable while creating a dependency… the only way for these systems to actually be generative would be in a world of open ontologies and linked data / rdf … but that’s not being built… although we know how to build it. just frustrating. feels like a lot of evolutionary dead ends are being explored that are cool, but where too they go? ultimately you end up back at cultural anthropology and group competition dynamics and that’s where i scratch my head - but it’s all very hard to put in words… you have to just kind of be able to see it… but there might be a simple way in. what if a generative AI were one that made everything that interacted with it “smarter”? So the test of generativity is the way something “rubs” off on everything an “agent” encounters. What does that look like and how would the design requirements for a “generative” ecosystem of agents look? It’s not that the AI does anything for you, but a generative AI would be more like wearing glasses… it would enable you to see more and more clearly and understand better… and it would require the sharing of that, as part of generativity…. or something like that… I just don’t see any real new game like this being played - I see cool software and tools - but all designed to play Varian’s Information Rules circa 1995 where customer value is a function of lock-in.
After the camera was developed it was use to verify reality as individual testimony could be biased
Now the word can become “flesh” and imagination “real”. Walking shadows in AI
I am focusing on the upsides here. But like any new technology, there are also serious downsides. The biggest of all being potentially the singularity.
Honestly, this was as timely and impactful as what put you on the map - Flatten The Curve. No one has put together the full story of state of Gen AI, right at the moment it begins the steep R0.
Well done!
Thank you.
In the grand scheme of things, COVID was more urgent, but this is more important!
WRT cinema and dubbing, the missing piece is to AI-edit the voices so that they (a) sound like the original actors and (b) feel like they happened during the actual action, instead of read off from the script in front of a microphone in a random sound studio.
Agreed. But do you think that’s impossible? It sounds quite possible to me to get to the 80-90% level of fidelity, which sounds like enough to me, since today dubbing is just spoken over the original version, which is terrible.
I see AI as a useful tool for humans, but still relatively primitive. What Roman soldier would plow a field in his military regalia? He would set that aside, and put on his grubby field working dress. AI can't figure that out. A human will have to program that into AI. I've recently read two articles about self-driving taxis in San Francisco. Both authors reported that after getting most of the way to their destinations, the AI got confused over something, pulled over and ended the trip. At least the humans lived to report it!
I’m the one who directed the Roman’s to be in military regalia to be fair. If I used the field working dress, they would have looked like normal farmers.
And you’re right on self-driving! It’s a much harder one than people though, like brick-laying.
Other fields are much easier, and even driving will be at some point automated.
Funny. Because I'm behind on some Uncharted Territories and never delete them from my inbox, I just read this article the day before reading "The Most Important Time in History Is Now".
https://unchartedterritories.tomaspueyo.com/p/the-most-important-time-in-history-agi-asi
It's absolutely insane to see the progress in just 2 years and 2 months. Strap yourself in for what happens next!
Maybe I should use this for contrast!
One application I see with promise is that I volunteer with the local fire district Emerg. Med Tech's. (I have an Emerg Med. Responder license which is an assistant to EMT's) We have training sessions where we show up and the patient exhibits certain symptoms. We keep gathering data to narrow down what may be the medical cause of the symptoms. The hospital is 25 miles away on a narrow road. Lives may be saved if we can diagnose what is happening quickly. I can envision a portable AI medical assistant that can observe what we're doing and make relevant suggestions. Also some of our patients don't speak English or a language we know so that's another obvious help.
That AI could also be on the phone with the patient while you arrive, so that by the time you're there you're 70% done
I was 15 when I saw first the Color TV, I was 23 when I knew about Satellite TV, I was 35 when I wrote my first email, and at 50, I bought my iPhone 4s. Now I'm 60, trying to fit into my seat belts.
The Internet changed Stock Image Industry within which I was involved. AI will kill (Transform) it.
Thanks for this excellent recap.
… and then there's the downsides. Besides the obvious: Who, for one, decides which images are or are not appropriate? OpenAI requires that we "Do not attempt to create, upload, or share images that are not G-rated or cause harm". This sentence happens to include half of modern art, as well as a heap of other topics that may or may not be related to arousing prurient interest. It's quite telling that not being even PG-rated appears to be more important than, say, not generating a glorifying image of a suicide.
This is possible if there are monopolies in the models. If there’s competition, the market will decide what’s acceptable.
Excellent article and I'm only half-way through! Will be utilizing a lot from this, so thank you. However, I still think we have hard work to do on AI alignment.
Indeed. Luckily none of these things will get us close to an AGI, but this moment is coming and we’re certainly not prepared.
Mind-blowing Tomas! I'd already seen some of this of course, particularly the image-related stuff. Short-term and rather mundanely the transcription software will help me a lot. Even as recently as a couple years ago I found human-captioning services worse than me just correcting AI-generating captions/transcriptions, and Adobe premiere has improved even more recently, but still needs a lot of proofreading/correcting. I film/edit a lot of medical industry videos, so if recognising technical language is getting more accurate that will save a huge amount of time. If Whisper or Descript are better... off to experiment...
Before writing the article, I was scared of getting into this because it seemed daunting. But plenty of tools are user-ready, and many others are engineer-ready. As you say, time to explore!
Descript you totally can. I couldn’t find a way to try whisper. Could you?
I haven't tried yet, I saved the Harry Ramsay link you included to go through with Whisper when I have more time - need to use Google Colab apparently and maybe it will finally make me dabble in coding... This article of yours itself I will have to go through a few more times! I opened several tabs with all the links and realised it was exponential in time required to read... :D
It took me months to gather the links and about 40h to write the 2 articles. I’m not surprised!
I may need to also re-read Asimov and Philip K. Dick...
Great framing around diffusion curves and where we are on the adoption S-curve. One practical angle I’d add is evaluation: teams need lightweight, task-specific evals (clarity, factuality, brand voice) instead of generic benchmarks. For visual work, I keep a small gallery of “golden outputs” and regenerate against them; Createimg.ai (https://createimg.ai) helps me test variations across models to see which stays closest to spec. Would love a follow-up on measuring drift as models update.
We made a market map that has some additional companies to the Sequioa one. The gen ai shift could produce even more value than the shift to cloud. https://base10.vc/post/generative-ai-mission-critical/
I think you’d be interested in this market map of gen ai companies we produced. The gen ai shift could produce even more value than the shift to cloud. https://base10.vc/post/generative-ai-mission-critical/
one thing about the most “generative” part of the eukaryotic cell is that the mitochondria can actually survive cell death and migrate into other living cells. real generativity is amazing and we don’t have accommodations for it in the current corporate model of group collaboration and competition. Corporations scale and hillclimb — but socially we do generativity in other types of institutions…. so is it that generative ai should actually be done in other social institutions? or is it that corporations will evolve? well - we will end up trying all of the above over time and we will see what works. this is kaufman’s adjacent possible - these are new combinations and permutations.
logically, it may work to say that prescription glasses are more generative than any software so far - the glasses correct vision to enable one to see where-ever you look when you want to look there. whereas software as it is pursued in a corporate model is more like glasses designed to make your vision worse while rate monitoring and limiting and billing for light being allowed to reach your eyes and reserving the right to shut off the light entirely based on a set of legal conditions no user understands. to call the latter a generative structure is bizarre logically…. software can be generative - but not in the lock-in model and not in corporations which are hillclimbers targeting local maxima…. so any sort of actual “generative AI” which comes into existences as the IP of a corporation such that the generative AI is effectively “commanded” by a hillclimbing algo (corporations are hillclimbers - that legal responsibility to pursue profit means you can be sued if you don’t keep climbing whatever hill you start on)…. so there is a limiting structure (the corporation) which has been an amazing evolutionary adaptation and mechanism for group competition and cultural evolution… but and this is the leap because everything so far should be obvious to a mildly intelligent observer of the everyday… the problem your generative AI article really highlights is the need for new forms of group organization and competition to drive cultural evolution. it feels like there is something that grows out of distributed computing and networks and involves human and digital agents and is the functional successor to the corporation. this is being pasted with - but I don’t think it exists yet… crypto and daos are playing with this but it will have to be an evolution of the corporate form more directly than those are… corporations are still massively well adaptive and effective - but it feels like corporations are stuck at something like the bacteria equivalent (single cell organisms) in the biological world (bacteria are hillclimbers in a sense - feast and famine)… and the path to eukaryotes was unavoidably bizarre whatever theory you subscribe to…. hope that makes a little sense. generativity can exist is a domain governed and limited by a hillclimber - if may effervesce momentarily - but it’s a mirage. the hillclimber will turn anything generative into something else to serve the requirements of reaching the corporation’s local maxima. the problem isn’t the generative ai - it’s the current state of the corporate form. this is evolution at work - bottlenecks shift and advances can be temporary and end up not working. generativity itself is kind of like an energy source - so grafting generativity into corporations in a way could be like the impact of mitochondria - but then mitochondria has their own dna or their own “law”…. so maybe there’s a useful logic in that metaphor… don’t know. just keep hearing Inigo montoya in my head when people talk about generative ai and venture capital and companies…
i would just define generative differently such that generative ai sounds oxymoronic… like intrinsic value in a business context - any precise thinker knows that businesses only have instrumental value or value in use… but all these people talk about intrinsic value - which is logically non-sense when they mean some sort of normalized value…. so in the immortal words of Inigo Montoya from the Princess Bride, i am constantly hearing in my head, “I don’t think that word means what you think it means.”…. Generativity in the sense of generative social relationships or generative networks feels very different than generative AI… the proposition for the AI isn’t really to be a liberating structure but the business proposition is to appear to enable while creating a dependency… the only way for these systems to actually be generative would be in a world of open ontologies and linked data / rdf … but that’s not being built… although we know how to build it. just frustrating. feels like a lot of evolutionary dead ends are being explored that are cool, but where too they go? ultimately you end up back at cultural anthropology and group competition dynamics and that’s where i scratch my head - but it’s all very hard to put in words… you have to just kind of be able to see it… but there might be a simple way in. what if a generative AI were one that made everything that interacted with it “smarter”? So the test of generativity is the way something “rubs” off on everything an “agent” encounters. What does that look like and how would the design requirements for a “generative” ecosystem of agents look? It’s not that the AI does anything for you, but a generative AI would be more like wearing glasses… it would enable you to see more and more clearly and understand better… and it would require the sharing of that, as part of generativity…. or something like that… I just don’t see any real new game like this being played - I see cool software and tools - but all designed to play Varian’s Information Rules circa 1995 where customer value is a function of lock-in.