Millions of people are turning to artificial intelligence chatbots like ChatGPT, Gemini and Grok for health guidance, drawn by their availability and seemingly tailored responses. Yet England’s Senior Medical Advisor, Professor Sir Chris Whitty, has flagged concerns that the responses generated by these tools are “not good enough” and are often “both confident and wrong” – a risky situation when wellbeing is on the line. Whilst some users report positive outcomes, such as obtaining suitable advice for minor health issues, others have experienced dangerously inaccurate assessments. The technology has become so prevalent that even those not intentionally looking for AI health advice encounter it at the top of internet search results. As researchers start investigating the potential and constraints of these systems, a key concern emerges: can we securely trust artificial intelligence for health advice?
Why Millions of people are turning to Chatbots Instead of GPs
The appeal of AI health advice is straightforward and compelling. General practitioners across the United Kingdom are overwhelmed, with appointment slots vanishing within minutes and waiting times stretching into weeks. For many patients, accessing timely medical guidance through traditional channels has become exhausting. Artificial intelligence chatbots, by contrast, are available instantly, at any hour of the day or night. They require no appointment booking, no waiting room queues, and no anxiety about whether your concern is
Beyond mere availability, chatbots deliver something that standard online searches often cannot: apparently tailored responses. A traditional Google search for back pain might quickly present alarming worst-case scenarios – cancer, spinal fractures, organ damage. AI chatbots, however, engage in conversation, asking follow-up questions and adapting their answers accordingly. This conversational quality creates a sense of qualified healthcare guidance. Users feel heard and understood in ways that automated responses cannot provide. For those with health anxiety or uncertainty about whether symptoms warrant professional attention, this personalised strategy feels genuinely helpful. The technology has fundamentally expanded access to medical-style advice, removing barriers that once stood between patients and guidance.
- Instant availability without appointment delays or NHS waiting times
- Tailored replies through conversational questioning and follow-up
- Decreased worry about wasting healthcare professionals’ time
- Clear advice for determining symptom severity and urgency
When AI Makes Serious Errors
Yet beneath the ease and comfort lies a disturbing truth: artificial intelligence chatbots often give health advice that is assuredly wrong. Abi’s alarming encounter illustrates this risk clearly. After a hiking accident left her with intense spinal pain and abdominal pressure, ChatGPT asserted she had ruptured an organ and required urgent hospital care straight away. She spent 3 hours in A&E only to find the symptoms were improving on its own – the AI had drastically misconstrued a small injury as a potentially fatal crisis. This was not an isolated glitch but indicative of a underlying concern that healthcare professionals are growing increasingly concerned about.
Professor Sir Chris Whitty, England’s Principal Medical Officer, has openly voiced serious worries about the standard of medical guidance being provided by artificial intelligence systems. He warned the Medical Journalists Association that chatbots pose “a particularly tricky point” because people are regularly turning to them for healthcare advice, yet their answers are often “inadequate” and dangerously “simultaneously assured and incorrect.” This combination – high confidence paired with inaccuracy – is especially perilous in healthcare. Patients may trust the chatbot’s assured tone and act on incorrect guidance, possibly postponing proper medical care or undertaking unnecessary interventions.
The Stroke Situation That Exposed Critical Weaknesses
Researchers at the University of Oxford’s Reasoning with Machines Laboratory decided to systematically test chatbot reliability by creating detailed, realistic medical scenarios for evaluation. They brought together qualified doctors to create in-depth case studies spanning the full spectrum of health concerns – from minor health issues manageable at home through to serious conditions requiring immediate hospital intervention. These scenarios were intentionally designed to capture the intricacy and subtlety of real-world medicine, testing whether chatbots could properly differentiate between trivial symptoms and genuine emergencies requiring urgent professional attention.
The results of such assessment have revealed alarming gaps in chatbot reasoning and diagnostic accuracy. When given scenarios intended to replicate genuine medical emergencies – such as strokes or serious injuries – the systems frequently failed to recognise critical warning signs or suggest suitable levels of urgency. Conversely, they occasionally elevated minor issues into false emergencies, as happened with Abi’s back injury. These failures indicate that chatbots lack the clinical judgment necessary for reliable medical triage, prompting serious concerns about their suitability as medical advisory tools.
Research Shows Alarming Accuracy Gaps
When the Oxford research group examined the chatbots’ responses against the doctors’ assessments, the results were sobering. Across the board, AI systems showed considerable inconsistency in their capacity to accurately diagnose severe illnesses and recommend suitable intervention. Some chatbots achieved decent results on simple cases but struggled significantly when presented with complicated symptoms with overlap. The performance variation was notable – the same chatbot might perform well in identifying one condition whilst entirely overlooking another of equal severity. These results highlight a fundamental problem: chatbots lack the clinical reasoning and expertise that enables medical professionals to weigh competing possibilities and prioritise patient safety.
| Test Condition | Accuracy Rate |
|---|---|
| Acute Stroke Symptoms | 62% |
| Myocardial Infarction (Heart Attack) | 58% |
| Appendicitis | 71% |
| Minor Viral Infection | 84% |
Why Genuine Dialogue Breaks the Computational System
One key weakness emerged during the investigation: chatbots struggle when patients articulate symptoms in their own phrasing rather than employing exact medical terminology. A patient might say their “chest feels tight and heavy” rather than reporting “acute substernal chest pain radiating to the left arm.” Chatbots developed using extensive medical databases sometimes overlook these colloquial descriptions altogether, or incorrectly interpret them. Additionally, the algorithms cannot ask the probing follow-up questions that doctors naturally pose – clarifying the beginning, duration, severity and accompanying symptoms that collectively provide a clinical picture.
Furthermore, chatbots are unable to detect non-verbal cues or conduct physical examinations. They cannot hear breathlessness in a patient’s voice, notice pallor, or palpate an abdomen for tenderness. These physical observations are fundamental to medical diagnosis. The technology also struggles with uncommon diseases and atypical presentations, defaulting instead to probability-based predictions based on historical data. For patients whose symptoms deviate from the textbook pattern – which occurs often in real medicine – chatbot advice becomes dangerously unreliable.
The Trust Issue That Deceives People
Perhaps the most significant danger of depending on AI for medical advice doesn’t stem from what chatbots get wrong, but in how confidently they communicate their errors. Professor Sir Chris Whitty’s alert about answers that are “both confident and wrong” highlights the core of the problem. Chatbots generate responses with an air of certainty that becomes deeply persuasive, particularly to users who are worried, exposed or merely unacquainted with medical complexity. They relay facts in careful, authoritative speech that mimics the voice of a qualified medical professional, yet they have no real grasp of the conditions they describe. This façade of capability obscures a core lack of responsibility – when a chatbot gives poor advice, there is no doctor to answer for it.
The psychological impact of this unfounded assurance should not be understated. Users like Abi could feel encouraged by detailed explanations that seem reasonable, only to find out subsequently that the advice was dangerously flawed. Conversely, some patients might dismiss authentic danger signals because a AI system’s measured confidence conflicts with their intuition. The technology’s inability to communicate hesitation – to say “I don’t know” or “this requires a human expert” – constitutes a critical gap between AI’s capabilities and what patients actually need. When stakes involve health and potentially life-threatening conditions, that gap transforms into an abyss.
- Chatbots are unable to recognise the limits of their knowledge or communicate proper medical caution
- Users might rely on assured recommendations without realising the AI lacks clinical analytical capability
- Inaccurate assurance from AI may hinder patients from accessing urgent healthcare
How to Leverage AI Safely for Health Information
Whilst AI chatbots can provide initial guidance on common health concerns, they must not substitute for qualified medical expertise. If you do choose to use them, regard the information as a starting point for further research or discussion with a trained medical professional, not as a definitive diagnosis or treatment plan. The most sensible approach involves using AI as a tool to help frame questions you could pose to your GP, rather than relying on it as your main source of medical advice. Consistently verify any findings against established medical sources and listen to your own intuition about your body – if something seems seriously amiss, obtain urgent professional attention irrespective of what an AI recommends.
- Never rely on AI guidance as a substitute for seeing your GP or getting emergency medical attention
- Cross-check AI-generated information alongside NHS recommendations and trusted health resources
- Be especially cautious with severe symptoms that could suggest urgent conditions
- Employ AI to assist in developing enquiries, not to replace clinical diagnosis
- Keep in mind that chatbots cannot examine you or review your complete medical records
What Medical Experts Actually Recommend
Medical practitioners emphasise that AI chatbots function most effectively as supplementary tools for health literacy rather than diagnostic tools. They can assist individuals understand clinical language, explore therapeutic approaches, or decide whether symptoms warrant a GP appointment. However, medical professionals stress that chatbots do not possess the understanding of context that results from conducting a physical examination, reviewing their complete medical history, and drawing on years of medical expertise. For conditions requiring diagnosis or prescription, medical professionals is indispensable.
Professor Sir Chris Whitty and fellow medical authorities advocate for improved oversight of medical data provided by AI systems to ensure accuracy and proper caveats. Until these measures are established, users should approach chatbot health guidance with appropriate caution. The technology is evolving rapidly, but present constraints mean it cannot adequately substitute for appointments with trained medical practitioners, particularly for anything beyond general information and individual health management.