Artificial intelligence chatbots can produce sharply different medical advice depending on how a question is phrased, raising concerns about their reliability in healthcare settings, according to a recent BBC report.
The findings highlight how even slight variations in wording can significantly change the response generated by large language models, which are increasingly being used by the public for health-related queries. Experts say the issue stems from how these systems process probability rather than understanding context in a human sense.
One example cited shows that similar medical questions, rephrased in different ways, can lead to advice ranging from reassurance to urgent warnings. The BBC report notes that this inconsistency can be particularly risky in situations where users rely on chatbots for guidance on symptoms or treatment decisions.
Small changes, big differences
Researchers say the phenomenon is linked to “prompt sensitivity”, where AI systems respond differently based on subtle changes in language, tone or emphasis.
Because chatbots generate answers by predicting likely responses from training data, they may interpret the same situation differently depending on how it is described. This can lead to inconsistent recommendations, especially in complex fields such as medicine.
“GenAI doesn’t fail loudly. It fails convincingly,” one expert noted, highlighting the risk that users may trust responses that appear authoritative but are not always reliable.
The issue becomes more serious when dealing with high-risk scenarios. A minor difference in phrasing could mean the difference between advice to stay home or seek immediate medical attention.
Growing reliance raises concerns
The use of AI chatbots for health advice has grown rapidly, with millions of users turning to tools for quick answers about symptoms, medications and conditions.
While these systems can provide general information, experts stress they are not designed to replace professional medical judgement. Health authorities in multiple countries have repeatedly warned that AI-generated advice should not be treated as a diagnosis.
The BBC report builds on earlier studies suggesting that chatbots can sometimes produce inaccurate or incomplete information, particularly when dealing with nuanced or context-dependent medical cases.
Calls for safeguards and regulation
Specialists are now calling for stronger safeguards, including clearer disclaimers, better training data, and systems that flag uncertainty in responses.
There are also growing demands for regulatory frameworks to ensure AI tools used in healthcare meet strict safety standards.
Experts say developers must focus not just on improving models, but also on how they are deployed. Context design, human oversight and clear escalation paths are seen as essential to reduce risks.
A tool, not a replacement
Despite the concerns, researchers acknowledge that AI chatbots can still play a useful role in healthcare by providing basic information, supporting administrative tasks and improving access to knowledge.
However, they caution that users must understand the limitations of the technology.
The BBC report concludes that while AI tools are becoming more sophisticated, they remain highly dependent on input quality and context. Until these systems can reliably interpret nuance and intent, experts say they should be used carefully and always alongside professional medical advice.
