Toward Trustworthy AI in Pharma
Managing hallucination and bias in pharmacovigilance
John Praveen, Associate Vice President, Pharmacovigilance & Medical Writing Account Delivery & PV Shared Services Offering (SSO) Lead, Accenture Solution Pvt Ltd.
The integration of generative AI (GenAI) in pharmacovigilance streamlines processes such as adverse event detection and data analysis. However, it introduces the risks of hallucination—where AI produces incorrect or fabricated information—and bias, which may lead to inequitable drug safety outcomes due to unrepresentative data. These issues threaten both patient safety and regulatory compliance. To address these challenges, organisations must prioritise high-quality, diverse datasets, continuous model validation, and strong human oversight. Ethical frameworks and transparency are also crucial to ensure that GenAI tools in pharmacovigilance are reliable, fair, and accountable, ultimately protecting patient welfare and upholding industry standards.

Hallucination in pharmacovigilance AI
Hallucination in the context of GenAI is not a metaphor — it describes a process where an AI system generates statements or conclusions untethered from reality. This is not the same as a typo or a miscalculation; these outputs can be articulate, convincing, and structured in ways that make them appear reliable. Yet, beneath the surface, they lack any grounding in factual evidence.
In pharmacovigilance, this might take the form of a safety report that confidently asserts a correlation between a particular drug and an adverse reaction for which there is no documented evidence, or a literature summary that cites studies that do not exist. It may come through in regulatory interpretations, where an AI tool paraphrases guidelines but subtly alters their intent, risking non-compliance.
Several factors conspire to produce such errors. If the model’s training data is incomplete or of questionable quality, it will try to “fill in the blanks” based on probability rather than fact. Models not specifically tuned for pharmacovigilance may misinterpret technical language and deliver superficially plausible but incorrect answers. Even well-trained models can stumble when given ambiguous or overly broad prompts, producing speculative content to satisfy the query.The danger is that hallucinations in PV are not always immediately obvious, especially when outputs are integrated directly into workflows where volume and speed are priorities. When hallucinated findings enter safety databases or regulatory reports unchecked, they can prompt unnecessary product recalls, delay recognition of genuine risks, or mislead public health decisions.
Bias in pharmacovigilance AI
While hallucination is an error of fabrication, bias is an error of imbalance. It emerges when AI outputs systematically favour or disfavor certain outcomes, populations, or interpretations due to skews present in the underlying data or the way the algorithms process it.
In pharmacovigilance, bias may manifest in many subtle ways: a model trained predominantly on data from high-income countries may miss key safety signals that are more common in lower‑income regions. Historical under-reporting of adverse effects in women or certain ethnic groups may lead AI to undervalue signals from these populations. Sparse representation of rare disease cases in the training set might mean that their safety signals are consistently overlooked. These distortions can creep in unnoticed because they are rooted in the realities of how drug safety data is collected — and in the inherent asymmetries of the global health system. If left unaddressed, bias jeopardizes the core PV mandate of equitable protection for all patients and can lead to regulatory breaches where outputs are demonstrably discriminatory or incomplete.
In both cases — hallucination and bias — the core issue is the same: trust. AI cannot improve pharmacovigilance if those who rely on it question its reliability or fairness.

Managing the risks: An integrated framework
Mitigating hallucination and bias in PV is not about a single technological feature or one‑time review. It requires layered safeguards — technical, operational, and governance-related — that work in concert.
1. Human-in-the-Loop oversight
Perhaps the strongest defence is the concept of Humans‑in‑the‑Loop (HITL) oversight. In this model, AI output is never the final word; it is subject to validation, interpretation, and, where necessary, correction by trained pharmacovigilance professionals. HITL ensures that generated case summaries, safety assessments, and prioritisation lists are filtered through human expertise before being acted upon. Importantly, oversight should be risk‑tiered: routine reviews for common scenarios, but deeper investigation for high‑impact cases, rare event patterns, and signals affecting vulnerable populations.
2. Purposeful data and model training
Bias often starts with skewed data. Avoiding it means feeding AI systems with datasets that are diverse, representative, and aligned with the realities of global medicine use. This involves intentionally sourcing adverse event data from underrepresented geographies and population groups, ensuring rare conditions are captured, and maintaining balance in gender, age, and ethnicity representation. Beyond the dataset, the model itself should be fine-tuned on pharmacovigilance‑specific sources — coding dictionaries, regulatory texts, historical PSURs — to understand the nuances of PV workflows and language. Continuous data refreshes guard against model drift as new products enter the market and safety knowledge evolves.
3. Explainable AI
In a high‑stakes, regulated field, explainable AI (XAI) is non-negotiable. PV teams need to know not only what an AI concluded, but why and based on what evidence. Outputs should be accompanied by references to source documents and display confidence scores that signal how certain the AI is about its findings. The ability to trace an output back through its reasoning chain is vital for regulatory audits and for fostering trust among PV professionals who must ultimately sign off on what the AI delivers.
4. Continuous monitoring and governance
AI systems change over time, particularly in dynamic operational contexts. Governance frameworks should mandate regular audits to measure hallucination rates, bias levels, and recall/precision metrics for safety signal detection. Oversight must be, involving PV experts, data scientists, ethicists, and compliance officers. Each identified incident — whether a hallucination or a biased output — should be logged, investigated, and inform retraining cycles, creating a feedback loop that continually strengthens the system.
5. Staying regulatory-aligned
Global health authorities are beginning to address the use of AI in regulated activities. The European Union AI Act, evolving FDA and EMA guidance, and WHO principles all reinforce the transparency, accountability, and risk‑based controls. PV teams should map their AI systems to these frameworks early and maintain audit‑ready documentation. Proactive engagement not only forestalls compliance risks but helps shape sensible guidelines for this emerging field.
6. Iterative deployment and continuous learning
AI integration should be incremental. Piloting a model in a defined, lowerrisk PV task — such as duplicate detection or literature screening — allows teams to evaluate performance using concrete metrics like precision, recall, and hallucination rate before extending its scope. Each deployment stage should be informed by real‑world user feedback, refining both the AI models and the workflows they support.
Blurb: Every byte of data fed into an Artificial Intelligence (AI) Systems shapes its decisions. The integrity, clarity, and inclusiveness of that data embody the principles we choose to embed in the intelligence we will create.
Why this matters
The implications of hallucination and bias go beyond technical performance. A hallucinated alert might spark unwarranted investigations and resource drain, while a biased detection model might leave an entire patient group unprotected. Both can erode confidence among PV professionals, regulators, and the public — not just in a particular AI tool, but in the broader project of using advanced analytics for drug safety.
Ultimately, the patient whose adverse event goes unnoticed because an algorithm was blind to their demographic profile is not saved by the promise of AI. The measure of GenAI’s success in PV is not just efficiency, but the breadth, accuracy, and fairness of its protection.
Conclusion
Generative AI offers pharmacovigilance capabilities once thought impossible — instant analysis of vast data pools, early recognition of emerging risks, seamless synthesis of complex reports. But these capabilities are valuable only when tempered with safeguards that prevent critical errors. Human oversight, purposeful and inclusive training data, Explainable AI, continuous governance, regulatory compliance, and controlled, iterative deployment are the pillars of safe and effective use.
With these principles in place, GenAI need not be a disruptive risk; it can be a transformative ally — one that enhances the reach, speed, and precision of pharmacovigilance while keeping the discipline’s ultimate goal firmly in focus: protecting patients, everywhere, without bias and without compromise.