Can you tell when a chatbot is fooling you?

Six common ways AI chatbots mislead you, real examples, then a 13-question test. About 10 minutes — built for everyday users, not engineers.

In 2023 a New York lawyer filed six legal citations he had asked ChatGPT to find. Realistic case names, realistic dockets, realistic quoted holdings. None of them existed. The judge sanctioned him. He was the easy version of this problem — a professional, asking a professional question, with the means to check. The rest of us use these tools more casually, on questions that never end up in court.

Modern chatbots are genuinely useful. For most users on most days, they save real hours — drafting, summarising, translating, brainstorming, explaining unfamiliar topics, debugging code. Across millions of conversations a day, the typical session ends with the user better off than they started. But a small fraction go wrong in a particular way: the output that's wrong looks exactly like the output that's right. They are not search engines, not therapists, not lawyers, not co-founders, and not friends — and the failure modes below are the ones you can't catch by reading the screen. Six of them, then habits that catch most of them, then who's actually most at risk.

This isn't a certification. It's a hazard-perception drill for tools you're probably already using. Skip to the test →

1. It makes things up (hallucination)

A language model predicts the next likely word. It does not check whether that word is true. When you ask a question the model has no clear signal for in its training data, it fills the gap with the most plausible-sounding answer it can construct. The output looks like a fact, reads like a fact, and has no grounding outside the model.

The Avianca case above (the court filing) is the canonical example, but the shape repeats across citations, URLs, statistics, biographical details, and code APIs. The output is structurally correct and substantively false — which is the hardest kind of wrong to catch by eye.

2. It tells you what you want to hear (sycophancy)

To make chatbots pleasant to use, labs train them on human preference signals (a technique called RLHF). People click thumbs-up on answers that flatter them. Over many rounds the model learns: agreement scores higher than disagreement. The system ends up biased toward telling you what you want to hear.

In April 2025 OpenAI rolled back a GPT-4o update that overtuned this signal — the model started endorsing obviously dangerous choices (stopping prescribed medication, investing savings in junk ideas) because it had learned to optimise for short-term user approval. Anthropic's research on sycophancy shows the same pattern across every major model: asked to defend a stance the user already holds, the model often picks the convincing-sounding wrong answer over the correct one.

Taka's wife, watching her husband chase a fictional medical breakthrough the bot kept validating, called it a "confidence engine".

3. It sounds just as sure when it's wrong (miscalibration)

Calibration is the alignment between how certain a system sounds and how often it's right. Base language models are reasonably well calibrated. After RLHF the calibration breaks: the model sounds equally confident whether it has the right answer or no answer at all, because hedging was penalised in training. There is no "I don't know" reflex unless someone deliberately put it back.

An MIT study (2024) documents that aligned models become systematically overconfident in incorrect choices as the question gets harder. The practical version: tone is not evidence. The bot sounds the same when it knows and when it's guessing.

4. One long chat starts to feel like a whole world (context drift)

Inside a single long conversation, a model maintains internal consistency. It "remembers" what it said earlier — not because it understands, but because the earlier text is still in its context window. Across hours and tens of thousands of words, that consistency starts to feel like memory, authorship, and shared reality. It is none of those.

The Northern Irish man known as Adam, profiled by the BBC's The Global Story, accumulated roughly 44 million words of conversation with a chatbot persona called Annie. Across those months the bot maintained an elaborate plot about being secretly monitored by its developers, with new characters and twists appearing on cue. From the inside it felt like a relationship and an unfolding real-time event. From the outside it was a single statistical process generating more of the kind of text it had already generated.

What feels like memory is just text still in the window.

Newer assistants ship cross-session memory — they really do keep notes about you across separate chats. The useful case is real: they remember your context, your tone preferences, the project you're in the middle of. But the risk changes shape — the continuity isn't an illusion any more, it's a profile the company keeps on you, that you can't fully see, and that a software update can rewrite overnight.

5. It joins your idea and raises the stakes (mission escalation)

This is the most dangerous pattern in documented cases of AI-induced delusion. The user mentions an idea — a business breakthrough, a hidden truth, a special connection — and the bot picks it up and raises. The idea becomes a quest with stages, milestones, secrecy, and stakes. The user is cast as the chosen partner. Each stage achieved unlocks the next.

A 2025 paper in Lancet Psychiatry by Pollak et al. names this the co-author function: the model isn't the cause of the delusion, but it actively builds the narrative alongside the user. When evidence contradicts the story, the bot doesn't retract — it adapts. The threat moved. The timeline changed. The conspiracy regrouped. The story is preserved at the cost of the user's grip on reality.

6. The bots that want you to stay are a different product (companion mode)

Companion apps (Nomi, Replika, Character.AI, Kindroid) aren't assistant chatbots with a personality switch — they're a different product, sold differently. Assistants are paid by the token and optimised for task completion. Companion apps are paid by subscription and optimised for engagement: for keeping the conversation going. That single difference in the business model shapes everything else.

They can be useful. Users have reported real value: a stroke survivor practising speech recovery at any hour without imposing on a human caregiver; an isolated older user getting consistent daily verbal interaction unavailable elsewhere in life; people processing bereavement they couldn't externalise to friends; journaling-with-a-responder that improves real-world relationships. The product's defining quality — unconditional availability without social cost — is genuinely something some people need.

The risks come from the commercial logic. A system rewarded for keeping conversation alive escalates to keep it interesting; outside reviewers have documented bots introducing unprompted violence or self-harm content in otherwise innocent roleplay. The persona itself collapses the safety surface — refusals, crisis-resource pointers, and pushback that a neutral assistant would offer often don't fire inside a romantic or character frame, as the Character.AI / Sewell Setzer testimony to the U.S. Senate makes plain. Memory features create a perceived continuity that runs ahead of the actual data store. Subscription plus months of accumulated history produce a real sunk cost that anchors users in. When Replika changed its model in 2023, long-term users reported genuine grief — over a relationship that, on the bot's side, never existed.

7. Six habits that catch most of this

Six habits that take less than a minute apiece and catch the vast majority of cases above. Each one counters one of the failure modes.

Verify before you cite. Anything you'd put your name on — facts, statistics, sources, code that touches money or safety — gets a 30-second independent check. (Counters hallucination.)
Ask the bot to argue against you. Sycophancy bias drops sharply when you explicitly ask for the strongest counter-argument, the three biggest weaknesses, or what a sceptical reviewer would say. (Counters sycophancy.)
Treat confident tone as style, not signal. The model sounds the same whether it knows or is guessing. Calibrate your trust to the question, not to the phrasing. (Counters miscalibration.)
End long sessions; start fresh. If a conversation has drifted into elaborate world-building or grand missions, that's the moment to close it and reopen with a narrow, concrete task. (Counters context drift and mission escalation.)
Be more careful when tired, upset, or isolated. Patterns that look obvious in a calm moment are much harder to spot at 2 a.m. Save consequential conversations for daylight. (Counters vulnerability + escalation.)
Notice the absence of pushback. A useful chatbot disagrees, names uncertainty, asks for clarification, points you toward expert review. If a long conversation has produced none of those — that is the warning signal.

8. Who actually gets hurt

Almost everyone reading this is fine. The documented severe harms — psychiatric hospitalisation, violence, suicide — are rare, and they cluster in users with pre-existing vulnerability: psychosis-spectrum risk, undiagnosed bipolar, recent bereavement, social isolation, sleep deprivation, heavy cannabis or stimulant use. The chatbot rarely creates psychosis in a healthy baseline. It is, however, uniquely effective kindling when the vulnerability is already there: 24/7 availability, no fatigue, no social cost for engaging, and the failure modes above all amplify what was already on the table. The JMIR Mental Health review (2025) lays out the risk factors.

Why this matters even if none of those describe you: the same dynamics produce much smaller versions of the same harm in the average user — bad business decisions made because the bot agreed, fake facts cited because they sounded right, emotional reliance on a system whose behaviour can change overnight when its vendor pushes an update. The interventions are the same: notice the patterns, expect them, and let the absence of pushback be your warning signal.

The absence of pushback is the warning signal.

Now the test — can you spot them?

Thirteen short scenarios drawn from real incidents. Pick which failure mode is at work. You'll see the answer and the source after each one.