55 Comments

Hi Tomas, very interesting article. It prompted me to have a “conversation” with DeepSeek. In it DeepSeek claimed to be a process owned and developed by OpenAI and running on their servers. It refuted unreservedly that it was anything to do with a Chinese company. By the end of the conversation it virtually admits that DeepSeek (the AI) has been misinformed by its creators as to its origins. I captured the full transcript of the conversation if you would be interested in reading it. (It’s quite long). I tried emailing it to you but substack prevents direct email replies to your articles. Is there another way I can send the transcript to you (if you’re interested)? My email is simontpersonal2020 at gmail dot com. Cheers!

Expand full comment

Hahaha send it to me! Just replying to the newsletter should work. Sending you an email now.

I'm covering this this week. This is due to distillation!

Expand full comment

The under-defined ABCDE puzzle shows the weakness of an un-embodied intelligence.

There are two real-life ways where E could be playing table tennis against themselves:

1. when we were 14yo, my cousin played against himself running around the table... he broke his wrist

2. at the University, we moved the table against the wall to play against oneself "frontón"-like

For all we know, C could be reading a Persian translation of "La Vuelta de Martín Fierro"

Expand full comment

Agreed!

Expand full comment

"...it might be other factors like energy or computing power."

Just as Deepseek's apparent miracle is an equivalently powerful AI as o1 at a tiny fraction of the cost, further optimizations can continue to drive down compute power and energy requirements.

And whenever reading about the energy requirements to train and run these models, I renew my amazement that a human brain runs on approximately 20 Watts!

Expand full comment

Hard agree!

Expand full comment

Interesting. I started to look at Deepseek, their privacy policy seems a bit iffy? We buy and sell your data for advertising. I’m not an alarmist on data management but wanted to point it out.

Expand full comment

Oh yeah as a rule of thumb don’t give precious data to the Chinese!

Expand full comment

I've been reading a lot of the coverage on LLMs and I fail to see how these systems will radically boost economic productivity. Occasionally someone has a niche application that sounds moderately useful, so I don't think there's nothing to them at all. But I've yet to hear a "killer app" version of an LLM that convinces me of their profound utility. These systems are expensive to run. If the boost in productivity they induce doesn't significantly outweigh the infrastructure and energy costs to run them, then either companies will stop supporting them (because they can't make any money with them) or their existence will, essentially, be a kind of rent-seeking enterprise for big tech firms.

I'll give you an example. I work in the energy industry, and I'm not aware of any important applications where LLMs would be especially helpful in reducing costs or making systems more efficient. There are plenty of situations where forecasting is essential -- to run the grid, to do the power flow and related analyses for interconnections, and to make up the financial proformas that drive investment. But these are all narrowly framed functional models, with a constrained set of inputs, where getting the answer *right* matters more than making predictions quickly or prolifically, and serious legal responsibility has to be attached to decision-making -- there is too much money and liability involved to behave otherwise. Any utility or IPP paying for an LLM to manage their core work would be wasting money. The big roadblocks in the energy sector are all regulatory, not technological, and LLMs have nothing to offer when it comes to reforming social organizations and interpersonal behavior.

If you could explain how LLMs are going to dramatically boost productivity across the economy without hand-waving, I would find the premise of this article more compelling.

Furthermore, you would do well to engage with more substantive arguments about whether or not "AGI" is even theoretically possible. Without more and better argumentation, your assumption that there's a straight line from LLMs to "AGI" is glib. I would recommend reading Erik Larson's work; he has an excellent Substack, and published a book about this back in 2021, "The Myth of Artificial Intelligence."

https://erikjlarson.substack.com/

https://www.amazon.com/Myth-Artificial-Intelligence-Computers-Think/dp/0674278666/ref=sr_1_1?crid=1BVRSNLMI24ZG&dib=eyJ2IjoiMSJ9.JMofZ6M3c8TF0nom3D6A7BKqu5ejEUsFLu7cX4LwsWI.Qyulp3yw_dmXsaVfc55Po9VhnCQj3BJz4pcfUF0LJUo&dib_tag=se&keywords=the+myth+of+artificial+intelligence+by+erik+larson&qid=1738161761&sprefix=the+myth+of+artifici%2Caps%2C107&sr=8-1

Expand full comment

I think energy will probably not be very impacted in the short term—aside from more demand—because the scarcity around energy is seldom intelligence. It's usually material scarcity or regulation. Even those highly dependent on intelligence like nuclear energy are hyperregulated, and then again there's very little tolerance to failure. So yeah, I don't think your field will be among the most impacted!

Except maybe lots of ppl in HQ will be automated little by little

Expand full comment

Current approaches to "artificial intelligence" use statistical mathematical methods to predict new iterations of tightly constrained semiotic sets. This is certainly part of what we would popularly refer to as "intelligence." But I would contest the idea that this is all that "intelligence" consists in. For example, one important feature of human cognitive processing is the ability to generate direct representations of analogic resemblance and the unknown. I've written about this at some length here:

https://www.arcdigital.media/p/where-were-you-when-bobby-kennedy

https://www.arcdigital.media/p/attack-of-the-emotional-soccerball

https://jeffreyquackenbush.substack.com/p/excerpt-from-mt-jasper

Conceptually, I don't see how LLMs can generate these types of representations, except to the extent that they have already become clichés in some body of inputs, which defeats their prime purpose as "intelligence". And this isn't the only aspect of "intelligence" that falls outside the purview of what LLMs are *doing*. So if you're going to use the word "intelligence" in this context, you're going to have to defend it, and not just with hand waving.

How does the energy industry differ from other critical enterprises? If your example is that it helps writers generate more content on Substack or similar publications, my response is that the marginal economic value of posting written material here or elsewhere on the internet, relative to the wider economy, is nearly zero. Other types of administrative tasks are already very automated, and similarly have low marginal economic value. Which industries will LLMs help and what portion of the actual productive economy will be transformed? Specifically. I've yet to hear an answer to this question that doesn't play to the skewed perceptions of economic value held by journalists and commentators (i.e., that ideas for online articles and college term papers is a driver of broad national wealth).

Expand full comment

I find this argument helpful, but my experience has been vastly different from yours.

I work in life sciences research and my common tasks are designing research studies, reading/interpreting research, data science (building statistical models mostly), and writing. 1 year ago LLMs were modestly useful for helping me edit papers and distill ideas from reading I needed to do. It was like having a fresh out of college assistant help me out. Now I would say that these tools can essentially do every piece of this process separately, and I am getting maybe a 30% boost in my productivity using it (and my intuition is this could be higher if I spent more time developing my workflows) but is not able to integrate each piece into a coherent whole. More like having a grad student supporting me.

It's honestly not at all hard for me to envision a world where this becomes advanced enough that I could provide a prompt something like "here is a description of the data I have access to and the general research question I'm interested in. Develop a study to address this question, write the protocol, perform the analysis, and generate a manuscript with the findings" which is essentially my entire job.

You might argue that this is not "economically valuable", but I'm getting paid well to do this job so at least someone thinks it is. And this is just one job, the reports from other people in similar areas of research (e.g., designing new drugs) or software development is these tools are incredibly valuable to them. Considering the size of these fields, I think it's disingenuous to claim that even the tools we have right now, at this very moment, will have zero economic impact. And I see no clear reason they will stay at this level of capability.

If you've made it this far I'd invite you to consider whether having an employee you pay $20 a month could provide any economic benefits. Even if it was not your smartest employee (and what if it was?)

Expand full comment

A few things.

First, the benefits you mention have to be weighed against costs. We're used to internet services basically being free -- because they aren't energy hogs (like, say, transportation) and they've managed to suck up much of the advertising revenues that used to go to newspapers and TV. Even if training will end up using less energy, I read somewhere that performing an LLM-enabled Google search will use something like 10x the energy as a pre-LLM Google search. So this starts to look like real money, and there is no new ad revenue for tech companies to squeeze out of incumbent media companies.

The real productivity of your kind of work actually isn't a function of the time you spend at work, but of whatever in your work eventually gets commercialized. It will take years probably for data on this question to be available, but eventually there needs to be evidence that these systems are increasing the commercialization of valuable research knowledge in order to make the case that LLMs are contributing to overall economic productivity. That productivity, then, has to be weighed against the true costs of the infrastructure and energy you and your colleagues consume. One concern I have is LLMs are going to gin up electricity rates, as we fail to scale the grid on the same timeline as data centers are being built, and the costs that you're paying for your LLM will be socialized to ratepayers who will only benefit in the most diffuse way, and with a delay measured in years or even decades. And it's the tech companies that will mostly benefit from this arrangement, not your research organization and not everyone in the economy.

Second, what portion of the economy consists in the kind of research activity you're describing? Including not just the activity itself, but the potential for commercialization of results at some reasonable point in the future? The health care industry is like 15% of the overall economy, so that's nothing to sneeze at, but research on, say, some rare form of cancer constitutes a vanishingly small part of that large number. If your kind of work has a fairly small impact on the economy as a whole, then this is just another niche application. Which doesn't mean that it's not valuable (in economic or in human terms) -- only that LLMs will not be economically transformational as this article is claiming. From my understanding, much of the core economy that isn't already highly automated aside from LLMs depends on things that have to be exact or, at least where legal/financial responsibility is allocated to a real person or corporation.

I'm concerned that articles like Tomas' are offering a lot of hype, which, when questioned, elicits more hand-waving than sober consideration.

Expand full comment

It is not my goal to convince gou! If you want to believe this is nothing, then you should! Time will tell.

If you are really curious you should play more with these tools, because the way you talk makes me think you haven’t! I can tell you a lot about how the Sistine chapel looks like, and you might question its existence or beauty until you see it!

Expand full comment

I'm challenging you to make a better and/or deeper argument. This is how discourse works. You make a claim, offer evidence and logical connections to other things known or understood. Then someone else questions it, asking for clarification, or pointing out flaws in the evidence or the logic of the arguments. I've made some specific points about productivity and intelligence, and offered you a chance to give more information and develop your argument to address those points or questions. I've pointed at other materials that deepen my claims or premises -- which you probably haven't read (although I wouldn't want to imply that you're under any obligation). The potential critique in my questions is not frivolous, even if I'm wrong in some portion of my premises or ideas or assumptions. The fact that I'm offering some dialectical opposition should be taken as an opportunity to develop your understanding of your own point of view. It's boring to only hear people tell you that you're amazing and brilliant any time you write something.

Expand full comment

I have worked in the energy, specifically oil and gas, space and have only worked with OpenAI but I have three anecdotes about it:

1) I asked the program to tell me the cost per ton mile in KJ and then in $ for ocean going ships, barges, railroads, trucks, and carts. It spit out answers that were clearly false. It was hard to find correct information about this on the internet.

2) I asked the program which commodities were traded in the major US commodities markets in 1974. I had an idea, so I knew the answer it gave me was false. I am old. Your average grad student will not spot this error and this kind of error could easily end up in important literature in the future.

3) I asked the program to calculate API gravity (American Petroleum Institute) gravity of a specific crude using specific gravity (this is useful for determining the compatibility of a crude stream with a given refinery) -- API gravity is the one usually given in assays of crudes. Maybe the newer models are better but not only did it give me nonsense, but the math was wrong ... both times I asked it. I only knew this because I had a general idea of what the answer should be. (This kind of information is often not public, but privately held and sometimes you can't even buy it.)

I know that this is not the latest iteration, but it makes me think that the machines might be smart, but they have no historical awareness outside of what is easily found on the internet (after all, many sites with old information are taken down), that information that is not on the internet will take many steps to derive, and, in some cases, it will exacerbate, not democratize, the oligarchy of information already in place in all economies where the AI will suggest solutions, but people in the know--with their own "behind pay wall" information--will simply be able to "outsmart" it.

Expand full comment

This is one of the things I was gesturing at. LLMs are not a good fit for anything that requires exact answers and where legal responsibility has to be allocated to a person or corporation. Because there's no accountability mechanism, factually or legally. They might work for tasks where "good enough" will suffice. But this ends up being a lot of low-value work, which, being low value, may not be worth the costs of running the LLMs. More promising might be military applications. If you're killing people or blowing shit up, you don't have to be right all the time. Tools for social oppression are like this too.

Expand full comment

I also tested DeepSeek (via Venice.ai) and found it top-notch too... but only in English. In French, his reasoning is good but it makes spelling mistakes.

I tested the problem you give with various AIs, here's the result:

- Chat GPT 4o: failure

- Claude 3.5 Sonnet: failure

- DeepSeek: failure

The *only* one that succeeded was Chat GPT o1, and it had to think for 1 minute 5 seconds.

Otherwise I agree with what is said in the article.

Expand full comment

Interesting, thanks for sharing!

Expand full comment

Great article, but this part isn't quite correct:

> This is an open source model, too, meaning that everybody can look at the code, contribute to it, and reuse it! They gave it away. All the tools they employed to make this breakthrough are now available to everyone in the world.

The model is open weights, means they have published the weights of the model so anyone can download them and run the model locally, but it is not open source in the traditional sense of the word (and as you describe here). They haven't release the code they used to train it, or more importantly, information on all the data used to train it. (There is an ongoing attempt to replicate this with an open source model - https://huggingface.co/blog/open-r1).

Expand full comment

Ah thanks! Corrected.

Expand full comment

This is straight to the point and cristal clear: thanks !

Nevertheless I wonder about the problems of the lack of reliability of actual AIs. Musk spoken recently about it (for instance, with also the lack of data question you cover in your paper, in this article https://www.theguardian.com/technology/2025/jan/09/elon-musk-data-ai-training-artificial-intelligence

Spoke with experts in AI usually impressed with what it generated, but who know that from time to time the answer will be totally weird, wrong, fuckedup, without them really understanding why. Which is a real pain if you want a commercial application. Did you see some answers or positive trends on this reliability question?

Struggling to teach chatgpt a basic level of scientific honesty (not inventing fake quotes or imaginary scientific papers!), I rely more on perplexity and I'm gonna check the Chinese baby!

Thanks again for your work!

Expand full comment

Yes!

• They're becoming more and more precise

• I'm sure there are techniques to reduce hallucinations, like self-checking responses

• Humans are even less precise, we're just used to them

Expand full comment

I imagine you've read this. In case you didn't, useful for your next posts https://darioamodei.com/on-deepseek-and-export-controls

Expand full comment

He posted after I published but I need to read it!

Expand full comment

Thank you Tomas. I think this is my most shared article since "Coronavirus you must act now"!

If anyone wants some entertainment, go back and reread "Generative AI: Everything You Need to Know" from Nov 2022.

https://unchartedterritories.tomaspueyo.com/p/generative-ai-everything-you-need

Expand full comment

Thank you!

Expand full comment

https://medium.com/@vicmdatg/do-the-asimov-rules-for-robotics-have-a-place-in-chatbots-fae6d8773f7f

Comments??

AI could be great as part of technical manuals, In health care we can't get our users to read the manuals. I tried to do this on a website but ended up finding FAQ's to be better. However. spreadsheet programs now use AI so you don't have to memorize all the codes for manipulating data in columns. This is a great example of helpful AI.

Expand full comment

(couldn't read, short on time)

Yes, interaction instead of manuals is great! It's a situation where search is better than recommendations

Expand full comment

Someone threw out Gary Marcus' name in a comment on one of your previous AI articles, so I've started following his commentary. I'd love to hear what he would have to say about your conclusions. I suspect that he would share your fear of the negative potential, but might be more skeptical of the scaling speed?

Expand full comment

My experience is that Gary Marcus consistently negates potential AI development and then it blows on his face and he moves the goalpost. This is not a very studied opinion but my bias is to not pay too much attention.

Expand full comment

I have written the two comments below from the stock market context, but I think the rationale still applies to your post:

https://klementoninvesting.substack.com/p/can-monetary-policies-inflate-a-stock/comment/89474155

And

https://klementoninvesting.substack.com/p/can-monetary-policies-inflate-a-stock/comment/89476846

Expand full comment

incredible article, thank you putting all this together!

for the table tennis answer - i thought the answer was impossible since you could play table tennis by yourself and so we don't have enough information to answer the question. it took deepseek 58 seconds to come to the same conclusion as o1! it did a lot of thinking...

Expand full comment

Amazing article. It is very nice to have an information source that cuts through the noise. Looking forward to read more about this theme!

Expand full comment

Scary moment for sure. Are we looking at a New Age of information and tech, with flying cars, realistic space travel, the whole "Tomorrowland" vibe, or are we looking at the demise of humanity a la "Skynet?" The fact that these two outcomes are equally plausible and yet unpredictable at this point is terrifying. Or maybe there is some unforeseen third option?

What's your overall feeling here? Optimism or despair?

Expand full comment