Generative AI: Everything You Need to Know

And How to Use it for Your Career and Industry

Nov 15, 2022

AI is not tomorrow, it’s today.

You better wake up and get on it. If you don’t, others will learn to use AI tools before you, increase their productivity, take your job or your company, and leave you in the dust.

Instead, you can quickly get up to speed and bring the future to the present.

A few weeks ago, I didn’t know that much about Generative AI. Since then, I’ve spent days studying it. I’ve summarized it all in two articles. This is the first one, to get up to speed quickly. You’ll get:

Examples of the latest in Generative AI, so you start forming an idea of what you could do with it.
A mental framework for how to think about AI.
The most common AI tools today.

Buckle up!

The Cambrian Explosion of Generative AI

The Crow, an AI movie, won the Cannes festival this year in the category of short films.

And this is just the beginning. This movie has the eerie artifacts of early AI, and required a dancer as a baseline, to modify into a crow’s movements. But if it can do this now, what will it do tomorrow?

Look at this video:

Jon Finger@mrjonfinger

Another Ai filmmaking experiment. Every pixel you see and voice you hear is generated by AI. Apps: Images: @midjourney Voices: altered ai Faces motion: Reface app Camera movement: PopPic app

9:12 PM · Jul 27, 2022

139 Reposts · 701 Likes

These guys took AI-generated images through Midjourney, animated the faces with Reface, panned the camera through PopPic, and altered voices through Altered.ai.

Or this one:

Runway@runwayml

Introducing AI Magic Tools Dozens of creative tools to edit and generate content like never before. New tools added every week. Available now: runwayml.com

1:06 PM · Oct 12, 2022

533 Reposts · 3.63K Likes

If this type of stuff gets you excited, scared, and a bit lost, good. By the end of the article, you should only feel excited. For that, let’s jump to the core of Generative AI, the type of AI that generates new things rather than process existing data. How should you think about it?

AI has three main components: text, images, and sound. Then, each piece can be further divided and combined. For example, video is a set of linked images, with a combination of two types of sounds, music and speech.

Text

At the core of everything is text, because it is the closest to thinking. If you want to chat with an AI, ask it questions, create a relationship with it, or create realistic characters, the core is a text-based AI.

Conversation

The most famous is GPT-3, from OpenAI, the most advanced Generative AI company today, also behind other advanced AI models like Whisper and DALL-E (we’ll talk about them in detail). There are also other models, like PaLM and LaMDA1.

If you’ve seen people chatting with AIs and getting confused about whether they are alive or not, odds are they were talking with one of the three. Here’s an example that I shared in the August 2022 Updates article:

*Mr Lemoine, who started advocating for the rights of the AI, was suspended by Google.*

Here’s one way somebody used GPT-3 to brainstorm business ideas, and then asks it to tell him about possible downsides, to refine the ideas.

fabians.eth@fabianstelzer

I use GPT-3 daily as a brainstorming scratch pad an always-on sparring partner to run ideas by feed it what you're working on, let it spit back insights and objections say you're thinking about starting an AI media startup:

Austen Allred @Austen

Everyone is overestimating how humanlike AI & GPT-3 are. Think of each prompt as a Google search. GPT-3 synthesizes and rewrites the top pages. Very cool! But it’s not making decisions or having emotion, it’s simply regurgitating what’s in the data set (books/articles about x). https://t.co/afL1yQnMzL

7:54 PM · Aug 22, 2022

22 Reposts · 201 Likes

A few years ago, there was a frenzy of AI chatbots. They didn’t succeed, because they were dumb. They were dumb, because GPT-3 was not here. Now the AI is getting good enough.

You can play with GPT-3 today. It has dozens of use cases, like:

*Join GPT-3, which I used for the first time last week.*

Here are some examples of companies working in the space2:

Write text for you: emails, ads, articles… Companies doing this include Jasper3, Copy.ai4, Writer, Writesonic, Peppertype, Hypotenuse, Anyword, Copysmith, Scalenut, Postearly, Lex, Rytr5…
Mem6 can write and rewrite text for you.
Sudowrite is focused on literature, and helps authors express themselves more poignantly.
Summari summarizes text.
Explainpaper.com does the same for scientific papers. A godsend.
Otter and Fireflies transcribe meetings on top of summarizing them for you.
Character.ai created a marketplace for bots.

It’s not hard to see where this is going: The future of this is the perfect personal assistant, coach, friend, or partner for every one of us.

You might have noticed other interesting use cases in the list I showed above. For example… coding?!

Coding

Indeed, coding can be seen as just translating between a human language and a machine language. Anything that GPT-3 can do for human language, it can do for coding.

You can do all sorts of coding with AI, from auto-completing your code, to asking it to explain a piece of code you don’t understand, all the way to simply generating full pieces of code, like in Replit’s7 Ghostwriter.

*This is the* transform use case. They have great examples of their other use cases

When the code is complex, the AI isn’t there yet. But a large proportion of code is not that complex. That’s what AI can do better and faster than humans.

Here’s an example of AI-enhanced coding by GitHub CoPilot:

Apparently, for the developers who’ve enabled it, 40% of their code is written by CoPilot. Imagine the productivity improvements!

This doesn’t need to be limited to pure coding. Pull requests can be automated. And any language works, like automating spreadsheet formulas.

Shubhro Saha@shubroski

This weekend I built =GPT3(), a way to run GPT-3 prompts in Google Sheets. It's incredible how tasks that are hard or impossible to do w/ regular formulas become trivial. For example: sanitize data, write thank you cards, summarize product reviews, categorize feedback...

5:37 PM · Oct 31, 2022

2.93K Reposts · 21.4K Likes

A more standard use case for text-to-text is translation. Here, the surprising piece is not the concept, but how good it is.

Transcription and Translation

In fact, OpenAI has an AI specifically focused on translation, Whisper, which it released less than two months ago. And it turns out it’s not just good at translation between languages; it’s also amazing at transcription from speech to text. If you think about it, it makes sense, since writing and speech can be seen as just different languages.

For those who’ve used it, it’s unbelievable.

Harry Ramsay@harry_ramsay

OpenAI's Whisper is by far the best transcription software I've ever come across. It's not perfect, but on the highest quality model, it's usually 99+% accurate, and it works even with background noise or fast speakers. The demos are unbelievable.

openai.com

Introducing Whisper

9:00 PM · Nov 10, 2022

1 Like

Go to Whisper to see demos.

You know how digital assistants are annoying because they don’t understand what you say? Well, if Whisper works as intended, that problem is in the past. Whisper can understand speech much better than Siri, any other AI you’ve used, or even humans, whether there’s background noise or not, and in many languages, and it can also translate between virtually any language.

It’s not open yet, but you can play with it for free as a developer. The options are infinite. Here are some:

You can use it as a standard transcript tool.
To create a pdf transcript of a youtube video, or automatically add captions.
To create subtitles in other languages.
To automatically record a meeting, transcribe it with Whisper, and summarize it with GPT-3, so that every meeting can be easily digested by anybody.
Do the same for medical conversations, where an AI can suggest ideas for diagnoses in real time
Translate conversations in near-real time!

This leads us to the second big area of Generative AI after text: speech and sound.

Sound

When translating speech between languages, you have different steps: Converting the sounds into intelligible speech, transcribing into the same language, translating into the other language, and then synthesizing the new words into speech.

This means you need to go from sound to text and text to sound. Each of these has its own use cases.

Speech

One of the most famous tools in the text-speech vertical is Descript, which I’ve used recently for the podcasts I’ve recorded. It creates a good transcript of what you said8. But the magic is not from speech to text. It’s the other way around: You can edit the transcript, and it automatically edits the words coming from your mouth! You can rewrite words, and it will say those words for you, in your voice! It can do the same for video apparently, but I didn’t try. Adobe is trying to do something similar for podcasts. Reduct does it for video.

Along the lines of text to speech, with Murf AI you can write any text and it will read it for you in different voices, or you can have it clone your voice. Here’s a quick demo I made:

Wellsaid also allows you to go from text to speech. What about speech in videos? Creating subtitles is becoming trivial, and many companies already offer it. So the most innovative companies focus on other areas. Papercup automatically dubs videos in other languages. Voicemod and Koe Recast allow streamers to change their voice in real time.

I couldn’t use it to clone my voice, but Resemble.ai allowed me to do it. It left me… speechless. It was me! The first time I heard it, for an instant I forgot that it was an AI who had produced it. I thought I had recorded it myself.

Coqui.ai is trying to become a full AI audio studio: Turn text into voice, and then direct the voice’s timbre, tempo, tone, enunciation, emotion, pitch, prosody, rate, duration, contour… Why would you record sound in the future, when it’s easier to create it from scratch?

The most interesting uses are appearing at the fringes of what’s possible. Podcast.ai took the world by storm when it created an entire podcast episode between Joe Rogan and Steve Jobs9. You’d imagine you can barely listen to these podcasts, but in fact they’re reasonable. The voices and intonations are quite accurate, the people are articulate, the topics are right… It’s far from the real thing, still in the uncanny valley, but it’s shockingly closer than I expected. Give it a few more years, and where is this going to go? Live conversations with any person you might want to talk with.

So far, we’ve discussed speech. But there are other sounds that are possible. Most obviously, music.

Music

Do you want to add a soundtrack? You can create it from scratch, from text to music! In this link, you can see somebody who created a tool where you put a prompt (like “astronaut riding a horse”), a few tags (like “space, saxophone, travel”), and it creates a pretty good soundtrack for it!

AK@_akhaliq

Mubert-Text-to-Music 🎵🎵🎵 Colab notebooks demonstrating prompt-based music generation via Mubert API GitHub: github.com/MubertAI/Muber…

8:06 PM · Oct 19, 2022

1.29K Reposts · 4.75K Likes

I created a few samples with Soundraw.io. If they’re truly created with AI, they sound surprisingly good! Better than this. Image-to-music is definitely not there yet.

This is still in its infancy, but you can imagine we’re not far away from truly great AI-generated soundtrack music with prompts. Jukebox, from OpenAI, has crazy samples including vocals!

Sound Effects

The same thing is starting with sound effects, with at least a couple of companies. Once this works well, can you imagine how amazing that would be for sound editors? How many millions of sounds do you need to have the perfect one for every single situation? And even if you have an ample library, it might not be the perfect sound to fit every video. Creating it from scratch would be much better.

These are some of the principles for speech and text. Now let’s move to another sense: the world of visuals.

Images

Text to Images

This is the most famous one. If you already know a lot about it, just jump to the next section. For those of you who want a quick summary of where we’re at in image generation, keep reading.

An AI won the top prize in a painting competition.

The images that AIs can generate are incredible.

Every single one of the pictures below was generated by an AI:

It works by giving an AI a text prompt, and the AI creates an image that corresponds to that.

The first AI to do this was DALL-E, and its successor, DALL-E 2 (you can access the API). But only a few people could access these. Since then, new, more open models have appeared, like Stable Diffusion and Midjourney. You can try some of these tools for free from your browser, like Midjourney, Craiyon10, Stable Diffusion through the Dream Studio tool, photosonic, or Hugging Face. You can also download DiffusionBee to run Stable Diffusion on your computer for free, no technical knowledge needed.

Here’s what Midjourney 3 gave me when I asked it for Roman soldiers harvesting11:

After some iterations and a lot of work, I got this:

These models improve all the time. For example, I just reran the same prompt in the new version of Midjourney, and got this on my first attempt:

*Not there yet, but this is on my first attempt, and with no additional work whatsoever.*

Here are some pictures that people have made, with a bit more work:

Sometimes it’s hard to find the right prompts for great images. Some tools are emerging to help: Here’s a prompts guide, and the official one for Midjourney. Other tools allow you to look at amazing pictures that others have created and the prompts they used. You can use Lexica, krea.ai, arthub.ai, Playground.ai, or Promptomania12.

Or, if it’s too hard, simply use GPT-3 to come up with better prompts!

Tools like Stable Diffusion and Midjourney allow you to go from text to image. In a way, what Lexica and Promptomania do is go from images to text. But you can also go from image to image.

Image to Image

Companies like Snapchat have already been using AI to edit pictures on the fly. Plenty other tools have appeared to do that, like:

Image Enlarger increases definition to your pictures.
Autoenhance enhances them automatically.
Remove eliminates backgrounds from images for free.
Magic Eraser deletes elements in a picture.
PhotoRoom combines both: element deletion and background elimination (and also allows you to blur backgrounds).
This tool allows you to create realistic-looking people.
You can take pictures of people or things and put them inside different scenes.
You can try different hairstyles.
You can take pictures and modify them with a prompt.

Some companies try to combine several of these tools, like ClipDrop, which allows you to remove backgrounds, persons, and text, upscale pictures, and change the light.

Product Hunt 😸@ProductHunt

Still can't get over @clipdropapp's mindblowing tech 🤯

7:00 AM · Nov 2, 2022

2.4K Reposts · 15.4K Likes

Other tools allow you to go from a rough sketch to a higher fidelity one in seconds:

Adam Howell@_adamhowell

Support for #stablediffusion init images and sketch-to-image is now live on @accomplice_ai Upload an image – or sketch directly in Accomplice – type something like “detailed sketch of a red eye on white background, illustration” and in 30 secs you've got production-ready results

4:17 PM · Aug 26, 2022

46 Reposts · 284 Likes

You have inpainting and outpainting, where the AI allows you to expand your image, or redo some parts of it.

William Buchwalter@wbuchw

Outpainting a miniature village with #stablediffusion

4:37 PM · Sep 2, 2022

49 Reposts · 482 Likes

William Buchwalter@wbuchw

Combining @StabilityAI #StableDiffusion generative powers + Human guidance and graphic skills* with tools like @Photoshop in a coherent workflow. * Of which I've got about 0 as can be seen below.

1:51 PM · Aug 26, 2022

967 Reposts · 3.36K Likes

You can also sketch images and get much better ones with a prompt:

What you’re doing there is simply nudging the AI in the direction of your image.

Or you can iterate with your AI, and add images and inpainting to direct a specific image you might have in mind.

AK@_akhaliq

stable diffusion img2img web UI + workflow video github: github.com/hlky/stable-di… reddit thread: reddit.com/r/StableDiffus…

5:42 PM · Aug 27, 2022

1.07K Reposts · 5.86K Likes

Design

Many of these tools, if slightly adjusted, can be used by designers.

You can

Create a color palette based on a concept.
Come up with icons.
Or patterns.
Or tiles.
Create textures.
Add textures to a 3D model.

You can also create products from scratch:

Antonio Cao@RemitNotPaucity

Gonna ship a Figma plugin to go from prompts + simple shapes to design ideas using #stablediffusion #aiart

6:00 AM · Aug 24, 2022

1.45K Reposts · 8.32K Likes

Or 3D objects:

Ben Poole@poolio

Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion

8:01 PM · Sep 29, 2022

1.56K Reposts · 5.86K Likes

So you can go from text to text, text to image, image to text, image to image… And since videos are just images in a sequence, the obvious next step is video.

Video

Video is the hardest, but it’s also the end goal. And progress there is every bit as exciting. Taking this idea of videos as combinations of images, you can easily create storyboards:

SALT@SALT_VERSE

🧂is a play-to-create experiment. To participate: ▫️ watch 🔊 ▫️ vote below ▫️ QT '🧂' for some sort of register

4:04 PM · Jun 16, 2022

21 Reposts · 220 Likes

You can isolate backgrounds and change them on the fly.

Patrick Esser@pess_r

#stablediffusion text-to-image checkpoints are now available for research purposes upon request at github.com/CompVis/stable… Working on a more permissive release & inpainting checkpoints. Soon™ coming to @runwayml for text-to-video-editing

12:03 AM · Aug 11, 2022

1.24K Reposts · 4.85K Likes

Also here.

NeRFs

Or simply create 3D environments!

nerfstudio@nerfstudioteam

Animating focal lengths now enabled in nerfstudio 🔭 Check out this capture from Egypt using nothing but a phone 🐫 #NeRF #nerfacto #AIart #neuralrendering #pyramids #Egypt

5:42 AM · Nov 7, 2022

123 Reposts · 819 Likes

NeRFs (“Neural Radiance Field) are a type of AI that can take a set of 2D images and create a 3D environment based on them. Then, you can move your camera around that 3D environment. It’s brutal.

Karen X. Cheng@karenxcheng

1/ Now you can create "drone" shots from your phone footage, thanks to NeRF Collab with @justLV / software @nerfstudioteam / see below for our process #NeRF #neuralrendering #artificialintelligence #instantNeRF

3:19 PM · Oct 25, 2022

1.19K Reposts · 6.79K Likes

This was not created with a drone! It’s just taking pictures with a phone, which then creates the scene!

Franc Lucent@franclucent

(1/2) Whenever I go to a new city, I always try and look for a botanical garden. They're calming, a great place to think or get some reading done, and usually well maintained. I wasn't sure how this would come out in #NeRF, but stepping into the middle of a greenhouse, the light

7:14 PM · Nov 10, 2022

22 Reposts · 217 Likes

What about creating full-blown universes with AI? Like this one, where an AI created a 3D model of a city, and an AI converted it into Manga-style13?

佐藤航陽 🌎 著書「世界2.0 メタバースの歩き方と創り方」@ka2aki86

AIに3Dモデルを自動生成させた後に、空間を漫画風に変換して、さらに色をつけてアニメ風にしてみた。『君の名は』風・ジブリ風・ディズニー風のメタバースとかもAIが自由自在に表現できるようにしていきたい。

2:48 PM · Nov 13, 2022

257 Reposts · 1.06K Likes

3D Objects

You can also extract objects and create 3D versions of them.

nerfstudio@nerfstudioteam

v0.1.9 Released 🎉 NeRF -> Point cloud export is now available ☁️ #NeRF #pointcloud #nerfacto

7:09 AM · Nov 11, 2022

86 Reposts · 564 Likes

Or create them from scratch:

Ben Desai@DesignInVr

For all the indecisive creatives out there: 🤯 #ai #dalle2 @kaedim3d #procreate @Procreate @OpenAI

2:04 PM · Aug 12, 2022

62 Reposts · 428 Likes

You can try them on:

ARthur Bouffard@arthurbfrd

I created a photo-realistic AR try-on experience in less than 1 hour using AI & ML Seeing how easy the process is, here’s a breakdown on how you can make this too🧵 (1/7)

5:10 PM · Nov 13, 2022

267 Reposts · 1.79K Likes

And make them come to life:

Sergei Galkin@sergeyglkn

1/6 How I transferred a physical toy to the digital world in one hour with AI -> LumaAI (3D scan) -> Cinema4D (editing) -> Mixamo (rigging and animation) -> MetaSparkStudio (AR)

1:05 PM · Nov 1, 2022

3.9K Reposts · 22.5K Likes

And with humans too!

Humans

I created this video in 10 minutes with an iOS app from a company called Scandy. It gets you one free scan per week.

Then you can act something out, and put a character on top of it:

Scott Lighthiser@LighthiserScott

@StableDiffusion Img2Img x #ebsynth x @koe_recast TEST #stablediffusion #AIart

3:32 AM · Sep 7, 2022

761 Reposts · 3.76K Likes

Or today.

And if you already have footage you want to just slightly edit to fit some changes, you can do that. For example for dubbing in other languages. Now the mouth fits the language14.

All of this is image to video, or video to video. But there’s also text to video!

*Every video here was generated based on prompts. To see more of these, go to Imagen Video. That was the Google version. Here’s the Facebook version.*

Even longer-form video!

Sundar Pichai@sundarpichai

1/ From today's AI@ event: we announced our Imagen text-to-image model is coming soon to AI Test Kitchen. And for the 1st time, we shared an AI-generated super-resolution video using Phenaki to generate long, coherent videos from text prompts and Imagen Video to increase quality.

6:21 PM · Nov 2, 2022

1.56K Reposts · 10.5K Likes

You can direct characters and tell them what to do… with text.

80 LEVEL@80Level

A group of researchers presented Motion Diffusion Model (MDM), a new diffusion-based generative model capable of generating human motions from text prompts. Details: 80.lv/articles/mdm-a… #AI #ArtificialIntelligence #research #animation

8:32 AM · Oct 4, 2022

257 Reposts · 1.18K Likes

With Hour One, Synesthesia, Rephrase.ai, and Jali, you can write stuff, and a human-looking AI will say it for you.

Some companies are trying to put everything in one place. From what I could find, Runway is the one company trying to create the perfect video editing tool, with AI inserted everywhere—the first tweet in this section is from them. You can remove backgrounds, snap images to sounds, reduce noise from images, complete scenes that you never filmed, get video from text prompts… All this help can reduce editing work by 60x.

Others like DeepBrain create AI avatars for you, from visuals to voice.

This stuff is not science fiction. The Amazon Prime show Peripheral used Midjourney to texturize some actors.

Takeaways

So that’s a review of all the big elements we’re seeing in generative AI: text, images, speech, sounds, video, and all their connections. According to the VC firm Sequoia, this is what it all looks like:

And here is Sequoia’s latest summary of companies working in the space.

Who do you know that needs to learn about Generative AI? Send them this article.

I’m very interested in this, so please send my way any piece that I have missed. And if you know an interesting company in the space, I want to hear about it!

Now the question becomes: How will your industry change based on AI? Your job? What new industries are going to appear? In tomorrow’s—premium—article, I’ll explore how different industries will change: writing, illustration, design, scientific research, marketing, sales, videogames, movies, investing… But also, how new industries will finally explode, like personal assistants and talking with the departed. Subscribe to read it!

If you’re accepted, you can use Google LaMDA’s Test Kitchen.

This list will get stale fast. The idea is not to be up to date, but rather give you a glimpse of what’s happening.

It raised $130M from the likes of Coatue and Bessemer at a $1.5B valuation

With investors like Sequoia, Tiger and others.

Some companies started even before this trend and have done a pretty good job. The most typical example is corrections.Grammarly is the most famous company in the space, and is worth $13B.

In fact it’s a personal assistant, but that use case sounds awfully hard to me to start with. I assume the copy editing is easier and better to start with, which is why I highlight it.

Note: Replit ran a crowdfunding round and I participated. I am a big fan of their project!

Although I’m hearing it’s not as good as Whisper.

They’ve released another episode between Lex Friedman and Richard Feynman since.

Works less well for me.

The actual prompt was: “two roman soldiers harvesting wheat on a field. A third soldier is riding an idle horse and watching the scene. Two villagers are running away in the distance towards a forest that is far away. Hyperrealistic, 8k, unreal engine”

There are also prompt guides for DALL-E, DALL-E 2, Stable Diffusion.

Translation: "After having AI automatically generate a 3D model, I converted the space into a manga style, and then added colors to make it look like an anime. I want to make it possible for AI to freely express things like "Your Name"-style, Ghibli-style, and Disney-style metaverses."

After spending 15 years in the US, dubbing looks weird to me. But growing up in Europe, it was the norm. So if you look at this and get weirded out, consider you might have the chance of not usually needing dubbing. Correcting the mouths is amazing for those who don’t have that luck. And although today it’s a bit weird, you can imagine it getting quite good, fast.

Roger Iliff

After the camera was developed it was use to verify reality as individual testimony could be biased

Now the word can become “flesh” and imagination “real”. Walking shadows in AI

1 reply by Tomas Pueyo

Indy C

Dec 2, 2022

Honestly, this was as timely and impactful as what put you on the map - Flatten The Curve. No one has put together the full story of state of Gen AI, right at the moment it begins the steep R0.

Well done!

24 more comments...

Uncharted Territories

Discussion about this post

Ready for more?