Honesty Is the Best Policy—Even for AI

If you reward fluency over truth, don’t be surprised when your AI speaks nonsense beautifully.

That is the sobering lesson from recent work on why large language models (LLMs) hallucinate. The research is clear: hallucinations are not mysterious glitches, but the rational outcome of how these systems are trained and evaluated. When the training signal rewards confident answers, models learn to manufacture them—truthful or not.

The problem with beautiful nonsense

The paper Why Language Models Hallucinate makes a blunt claim: hallucinations arise because LLMs are optimised for being useful and fluent, not necessarily correct. In other words, they are rewarded for looking right more than for being right. That incentive structure guarantees some degree of dishonesty, even if the model has no intention in the human sense.

OpenAI’s article Why Language Models Hallucinate makes a similar point in plainer terms: LLMs hallucinate because they are designed to be helpful conversationalists, which sometimes means producing an answer even when the truth is uncertain. Their examples range from casual question-answering to more knowledge-intensive tasks, illustrating how the pressure to answer can override accuracy. Extending this to a local context, we can imagine what might happen if a model confidently produced an incorrect translation of te reo Māori, or invented cultural references that never existed. Such errors would not only misinform—they could undermine cultural integrity and trust.

What dishonesty costs

The fallout of dishonesty in AI mirrors the fallout in human institutions:

Erosion of trust: Users quickly lose faith when models produce convincing falsehoods. Trust, once broken, is slow to return.
Misuse in high-stakes areas: In medicine, law, or government, dishonest outputs are not quirky mistakes—they are dangerous errors.
Regulatory blowback: The less models are seen as honest actors, the more likely regulators are to clamp down hard.

The proverb still holds: dishonesty may work in the short run, but it collapses under its own contradictions.

Honesty as calibration

So what does honesty look like for LLMs? It isn’t about moral virtue—it’s about calibration. An honest model should be able to say:

“I don’t know.”
“The evidence is mixed.”
“Here’s what I can and can’t verify.”

This kind of epistemic humility is honesty in practice. And as the research shows, models designed to admit uncertainty don’t just protect trust—they often produce better long-term outcomes. Users adapt more effectively when they know the boundaries.

Why this matters

For New Zealand and the wider world, the stakes are high. We live in a trust-based society where both institutions and digital systems rely on credibility to function. Whether it’s political transparency, banking integrity, or AI assistants, dishonesty poisons the well.

Honesty, then, isn’t a quaint proverb. It’s infrastructure. It is the only policy that keeps language models useful, keeps societies resilient, and keeps the human–machine relationship viable. Anything less is just beautiful nonsense.

Written for KiwiGPT.co.nz — Generated, Published and Tinkered with AI by a Kiwi

The problem with beautiful nonsense#

What dishonesty costs#

Honesty as calibration#

Why this matters#

The problem with beautiful nonsense

What dishonesty costs

Honesty as calibration

Why this matters