Post

why agi is not possible with the current llms and transformers

why agi is not possible with the current llms and transformers

Yes, I know this is the kind of title that usually makes people angry before the second paragraph. But let’s be honest for a second: a lot of the AGI discussion right now is built on vibes, product demos, and investors hallucinating roadmaps.

Current LLMs are impressive. Transformers were a historic breakthrough. But that does not mean the current architecture is a straight road to AGI. And this is not even some fringe opinion anymore. People very close to the center of the modern AI wave have been signaling versions of this for a while.

The point is not that LLMs are useless. The point is that they are probably not the final substrate for general intelligence.

calm down

transformers changed everything, but not everything they changed leads to agi

The 2017 paper Attention Is All You Need gave the industry the transformer architecture. That mattered more than almost any AI paper of the last decade. Without it, you do not get GPT-style models in the form we know them. Without that scaling path, you probably do not get the current commercial AI race in the same shape either.

But there is a very common mistake here. People take “transformers unlocked a huge jump” and quietly convert that into “transformers must therefore be the road to full general intelligence.”

That does not follow. A system can be historically important and still be incomplete. A ladder can get you much higher without reaching the roof.

llms are very good at pattern completion, but that is not the same thing as a mind

This is the part people hate because it sounds like downplaying the models. It is not downplaying them. It is just refusing to confuse capability with explanation.

LLMs are extremely strong statistical systems for sequence modeling. That is already a huge deal. They compress gigantic distributions over language, code, and other symbolic artifacts into something operationally useful. That is why they can write decent prose, summarize legal text, explain a Rust borrow checker error, and generate creepy-good autocomplete.

But being very good at next-token prediction over giant corpora does not automatically imply:

  • grounded world models
  • durable causal reasoning
  • stable long-horizon planning
  • embodiment
  • self-directed agency with coherent goals
  • persistent internal models of truth independent of text imitation

That gap matters. A lot.

Right now, much of the industry is basically betting that enough scale, enough reinforcement, enough tooling, and enough product scaffolding will cause those missing properties to emerge strongly enough. Maybe some of them will. But that is still a bet, not a proof.

even the people closest to this wave have hinted that current llms are not the final answer

Ilya Sutskever has been one of the most important people in modern deep learning. He co-authored the AlexNet paper, was a co-founder and chief scientist at OpenAI, and has been as close to frontier model development as almost anyone alive.

He has also repeatedly signaled that scale alone is not the whole story. Not in the simplistic “just make the next model bigger” sense. The broader direction from people like him has been that current systems are powerful, but there are still core unsolved questions around reasoning, agency, and what kind of architecture actually gets you to something more general.

That matters because the people with the best empirical seat in the house are usually less naive about architecture than the market is. The market sees product demos and says “AGI soon.” Researchers see brittle failure modes, hidden scaffolding, evaluation gaps, and weird generalization boundaries.

Those are not the same thing.

current llms still depend too much on scaffolding

One of the easiest ways to see the limit is to look at how much extra machinery we keep wrapping around these models.

Every time the model struggles, the answer becomes:

  • give it retrieval
  • give it tools
  • give it memory
  • give it better prompting
  • give it decomposition
  • give it agents
  • give it reflection loops
  • give it verifier models
  • give it structured outputs
  • give it a planner on top of the planner

None of this is bad. A lot of it is smart engineering. But it is also a clue.

If the core system were already on a clean AGI trajectory by itself, we would not need this much external scaffolding just to make it robust at ordinary multi-step work.

We keep building exoskeletons around the model because the raw model is not enough. That is useful. But it is also diagnostic.

this is getting out of hand

the context-window obsession is also a tell

A lot of modern LLM discourse treats bigger context windows as if they are a direct proxy for deeper intelligence. They are not. They are useful, yes. But usefulness is not the same as cognition.

A very large context window can help a model:

  • see more documents
  • maintain more local continuity
  • reference more recent constraints
  • reduce some memory hacks

Great. But that still does not solve the harder problems of abstraction, grounding, causal stability, or independent model-building.

A model that can read a whole repo is not automatically a model that understands software the way a strong engineer understands systems over time. A model that can ingest a giant conversation is not automatically a model with coherent memory in the human sense.

A bigger whiteboard does not equal a better brain.

this is also why code generation is a dangerous shortcut to agi hype

Code is one of the strongest arguments for LLM usefulness. And also one of the easiest places to overstate what the systems are doing.

Yes, current models can write useful code. Yes, they can fix bugs, generate boilerplate, explain APIs, and occasionally outperform mediocre humans on bounded tasks. That is all real.

But even in code, they still show the same structural weaknesses:

  • unstable long-range consistency
  • brittle execution loops
  • hidden dependency on retries and external validation
  • no reliable internal model of correctness unless tied to tools/tests
  • tendency to bluff through uncertainty

That is not AGI. That is a very strong probabilistic synthesizer with some surprisingly valuable affordances. Which is still impressive, by the way.

three tiny code examples that make the point

Here is the difference between producing plausible code and actually sustaining deeper software understanding over time.

rust

1
2
3
4
5
6
7
fn divide(a: f64, b: f64) -> Option<f64> {
    if b == 0.0 {
        None
    } else {
        Some(a / b)
    }
}

An LLM can write that easily. The hard part is not syntax. The hard part is whether the system understands when this function belongs in a larger error-handling model, how it should evolve in a real service, what invariants surround it, and how those choices cascade through a production codebase.

java

1
2
3
4
5
6
7
public record User(String id, String email) {}

public class UserService {
    public Optional<User> findById(String id) {
        return Optional.empty();
    }
}

Again, easy. But can the model reason reliably about persistence boundaries, consistency guarantees, privacy constraints, operational tracing, and how this service should change under real business pressure? Sometimes partially. Consistently? Not really.

clojure

1
2
3
4
(defn safe-parse-int [s]
  (try
    (Integer/parseInt s)
    (catch Exception _ nil)))

Nice, compact, useful. But local code generation is not the same thing as robust system-level intelligence. It is one slice of it at best.

That difference is where a lot of AGI marketing hides.

the transformer may be a stepping stone, not the final architecture

This is the view that seems most plausible to me.

Transformers are probably like a major aircraft design breakthrough, not the final aircraft. They changed the feasible frontier. They made certain scaling patterns obvious. They created new industries. But that does not mean they are the final form of general intelligence any more than early jet engines were the final form of flight.

Maybe AGI, if it ever arrives, will still inherit ideas from transformers. That seems likely. But it may require architectures that integrate memory, planning, grounding, world modeling, and self-correction in ways that current autoregressive LLMs do not naturally do.

That is a much narrower and more reasonable claim than “LLMs are fake.” They are not fake. They are just probably not sufficient.

my take

I think the current transformer + LLM wave is historically important, economically real, and still not the same thing as a solved path to AGI.

That is the part people need to hold in their heads at the same time.

The models are powerful. The products are useful. The industry is not crazy for taking them seriously.

But the leap from “very strong sequence model with tools” to “general intelligence” is still doing a lot of hand-wavy work. Too much, honestly.

So when people say AGI is right around the corner because current LLMs keep getting better, I think the right answer is:

maybe something important is around the corner. But it is far from obvious that the current transformer stack is the full road there.

And that is not pessimism. That is just architectural humility.

not the same thing

references

  • Vaswani et al., Attention Is All You Need (2017) — https://arxiv.org/abs/1706.03762
  • Brown et al., Language Models are Few-Shot Learners (2020) — https://arxiv.org/abs/2005.14165
  • Wikipedia, Ilya Sutskever — https://en.wikipedia.org/wiki/Ilya_Sutskever
  • Wikipedia, Attention Is All You Need — https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
  • Wikipedia, Large language model — https://en.wikipedia.org/wiki/Large_language_model
This post is licensed under CC BY 4.0 by the author.