How an LLM Works
An LLM generates text by predicting the next likely token using the prompt and the available context.
Key terms
Prompt
User input enters the system.
Tokens + context
Text is split into tokens and loaded into the active context window.
Predict + stream
The model predicts the next token in a loop and streams the answer.
Core loop
Predict next token
Inputs that matter
Prompt + context + prior tokens
Truth guarantee
None by default
Business impact
How this shapes cost, speed, risk, and control.
Cost sensitivity
Token-driven
Larger prompts and longer outputs both raise cost.
Latency
Grows with output length
Streaming masks it, but more tokens means more time.
Quality
Depends on prompt + context
Better framing and better context data improve answers more than brute-force model size.
Truth guarantee
None by default
The model predicts likely text; it does not fact-check itself.
What can go wrong
Common failure modes to watch for when this concept shows up in production.
Hallucination
The model generates plausible but unsupported content when evidence is missing or weak.
Missing context
Important information is not in the active prompt, so the model cannot use it.
Overconfident tone
Fluent language can read as certainty even when the underlying answer is weak.