What Most People Get Wrong About LLMs

Most people who use LLMs daily have a broken mental model of what they are. This matters because your mental model determines how you use them — and the wrong model leads to the wrong expectations, the wrong architecture, and the wrong conclusions when things go wrong.

Here's what I see most often.

Misconception 1: LLMs Know Things

People treat LLMs like a knowledgeable colleague you can ask questions. "What's the capital of France?" "What does this error mean?" "How does transformer attention work?"

For factual questions with stable answers, this works. But it works for the wrong reason.

An LLM doesn't "know" Paris is the capital of France the way you know it. It learned that the token sequence "capital of France" is followed by "Paris" with overwhelming frequency in its training data. It's a very sophisticated pattern matcher, not a knowledge store.

This distinction matters the moment you ask about:

Recent events (after training cutoff)
Niche facts with sparse training representation
Anything where the correct answer looks like a plausible wrong answer

LLMs hallucinate not because they're broken but because they're doing exactly what they were trained to do — predicting plausible next tokens — when the answer isn't well-represented in training data.

The fix: don't use LLMs as knowledge bases. Retrieve the facts first (RAG, search, database), then ask the LLM to reason over them.

Misconception 2: More Capable Models Are Always Better

The default move when something doesn't work is to upgrade the model. GPT-4 not working? Try GPT-4o. That not working? Try o1.

Sometimes this is right. But more often, the problem is the prompt, not the model.

A vague instruction given to a small model will produce a vague result. The same instruction given to a large model will produce a more confidently vague result. You haven't solved anything.

I've shipped production features using Llama 3 70B that outperform GPT-4 on the same task — because the prompt was designed around what the model actually needed: clear role definition, explicit output format, constrained temperature, relevant context.

The order of operations matters: get the prompt right on a fast, cheap model first. Only upgrade when you've hit a genuine capability ceiling.

Misconception 3: LLMs Reason

This one is the most dangerous.

LLMs produce text that looks like reasoning. "Let me think step by step... therefore the answer is X." It's convincing. It often arrives at correct answers. But the process isn't reasoning in any rigorous sense.

A reasoning system can follow a logical chain regardless of what it looks like. An LLM follows patterns that resemble reasoning because its training data contains a lot of reasoning-shaped text. When the correct answer happens to look like the patterns it learned, it gets it right. When it doesn't, it confidently produces a coherent but wrong chain.

Chain-of-thought prompting works not because it makes the model reason, but because it slows down token prediction and gives the model more surface area to match against reasoning patterns in training data.

This is useful. It's just not reasoning.

The practical implication: don't trust LLM outputs in domains where correctness can't be verified externally. For math, use a calculator. For code, run the code. For facts, check the source. Use the model to generate, not to verify.

Misconception 4: The Context Window Is Free

Longer context windows are genuinely useful. Being able to pass an entire codebase or document into context is a real capability improvement.

But there's a cost that isn't just monetary. LLM attention isn't uniform across the context window — models attend better to the beginning and end of the context, and performance degrades on information buried in the middle. This is the "lost in the middle" problem and it's well-documented.

Dumping everything into the context and hoping the model finds what it needs is not a good architecture. The right approach is still retrieval — find the relevant pieces, put them near the top of the context, keep the window focused.

What They're Actually Good At

Once you have the right model, LLMs are genuinely powerful for:

Structured transformation. Take unstructured text and produce structured output. Extraction, classification, normalization. This is reliable when you define the output schema clearly and validate it.

First drafts. Generating a starting point that a human improves is dramatically faster than starting from scratch. Code, emails, documentation, summaries.

Interface over structured data. Natural language queries over a database or knowledge base, where the LLM translates intent to query, not answer.

Explaining and annotating. Given a piece of code or text, produce an explanation. Models are good at this because explanations are well-represented in training data.

The common thread: tasks where plausible and correct are close to the same thing, and where you can verify the output. The further you get from that, the more carefully you need to design your system around the model's actual behavior.

Build with the model you have, not the model you wish you had.