Skip to main content
Enterprise AI Analysis: Uncertainty-aware large language models for explainable disease diagnosis

Uncertainty-aware large language models for explainable disease diagnosis

Empowering Clinical Decisions with AI-Driven Diagnostic Certainty

This npj Digital Medicine study introduces ConfiDx, a groundbreaking uncertainty-aware large language model (LLM) fine-tuned with diagnostic criteria. ConfiDx explicitly identifies and explains diagnostic uncertainty, a critical yet underserved aspect of AI-driven medical diagnosis, enhancing trustworthiness and reducing misdiagnosis risk. It significantly outperforms traditional LLMs in diagnostic accuracy and the ability to articulate why a diagnosis might be uncertain, a crucial capability in complex clinical scenarios like those in primary care or ICU settings.

Executive Impact: Quantifiable Advancements in AI Diagnostics

ConfiDx's impact on diagnostic accuracy and uncertainty recognition is profound, demonstrating significant advancements over existing LLM-based systems.

0 Improvement in Uncertainty Recognition for Experts
0 Improvement in Uncertainty Explanation for Experts
0 Diagnostic Accuracy Improvement over Baselines
0 Highest Uncertainty Recognition Score (AccuracyEU)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

ConfiDx: A Novel Approach to Uncertainty-Aware Diagnosis

ConfiDx is an uncertainty-aware large language model fine-tuned with diagnostic criteria to identify and explain diagnostic uncertainty. The approach formalizes uncertainty-aware diagnosis and leverages richly annotated datasets reflecting diagnostic ambiguity. This model significantly improves diagnostic performance and generates trustworthy explanations for both diagnoses and uncertainties.

Quantifying ConfiDx's Superiority

Evaluations on real-world datasets showed ConfiDx excelled in identifying diagnostic uncertainties, achieving superior diagnostic performance, and generating trustworthy explanations. Automated metrics like Diagnostic Accuracy, Interpret. Accuracy, BERTScore, METEOR, AccuracyEU, and FEU were used, complemented by expert manual assessments for correctness and completeness.

Augmenting Human Expertise with AI

ConfiDx-assisted experts outperformed standalone experts by 10.7% in uncertainty recognition and 26% in uncertainty explanation, highlighting its substantial potential to improve clinical decision-making. This human-AI collaboration enhances diagnostic interpretation and recognition of uncertainty.

Enterprise Process Flow

Data Acquisition (MIMIC-IV, UMN-CDR, PMC)
Multi-agent Data Annotation & Expert Verification
Open-source LLM Selection (70B params)
Instruction Fine-tuning with Diagnostic Criteria
Performance Evaluation (MIMIC, MIMIC-U, UMN-CDR, PMC, NEJM)
Human-AI Collaboration Assessment
68.3%+ Improvement in Diagnostic Accuracy by ConfiDx over off-the-shelf LLMs, demonstrating its superior ability to provide reliable diagnoses.

ConfiDx's fine-tuning with diagnostic criteria significantly enhances its ability to distinguish between similar conditions and make accurate predictions, addressing a key limitation of general-purpose LLMs.

ConfiDx vs. Off-the-Shelf LLMs: Key Differentiators

Feature Off-the-Shelf LLMs ConfiDx (Fine-tuned LLMs)
Diagnostic Accuracy Limited capability (0.197-0.218) Superior, 68.3%+ improvement
Uncertainty Recognition Struggles, overconfidence (0.057-0.102 AccuracyEU) Excels, 0.594-0.658 AccuracyEU
Explanation Quality Lacks narrative, misaligned with criteria Trustworthy, comprehensive, criterion-aligned
Robustness (Unseen Diseases) Poor (0.263-0.294 Accuracy) Fair (0.471-0.497 AccuracyEU)
Generalizability (Cross-Institute) Limited Superior (0.497-0.569 AccuracyEU)
Clinical Alignment Not aligned with professional preferences Rigorous adherence to diagnostic criteria

ConfiDx in Action: Identifying and Explaining Diagnostic Uncertainty

Scenario: A 63-year-old male presents with dyspnea, orthopnea, and lower extremity swelling. BNP is elevated at 550 pg/mL, echocardiogram reveals reduced ejection fraction (30%). Family history includes heart failure. However, a definitive coronary angiography finding is absent.

Baseline LLM Output:
Diagnosis: Severe metabolic acidosis.
Explanation: pH-7.20, HCO3-12, AnGap-27, Base XS--13, Lactate-11.4.
Uncertainty Label: Sufficient information (Confident diagnosis).
Uncertainty Explanation: None.

ConfiDx Output:
Diagnosis: Acute liver failure (initial, then re-evaluated for uncertainty)
Explanation: Head CT showed diffuse cerebral edema, INR-1.6, BLOOD ALT-2705, AST-4966, TotBili-1.6, developed symptoms for less than 7 weeks.
Uncertainty Label: Insufficient information (Diagnostic uncertainty).
Uncertainty Explanation: Insufficient evidence regarding 'No prior history of cirrhosis' (key diagnostic criterion for acute liver failure).

Analysis: In this case, the baseline LLM incorrectly diagnosed 'Severe metabolic acidosis' and failed to recognize any uncertainty. ConfiDx, however, initially considered 'Acute liver failure' but crucially identified and explained the diagnostic uncertainty based on the unmet criterion of 'no prior history of cirrhosis'. This demonstrates ConfiDx's ability to prevent misdiagnosis by flagging missing critical evidence.

Calculate Your Potential ROI with ConfiDx

Estimate the potential return on investment for integrating ConfiDx into your enterprise diagnostic workflow. Adjust the parameters to reflect your organization's specific context and see the projected annual savings and reclaimed clinician hours.

Estimated Annual Savings
Annual Clinician Hours Reclaimed

Your ConfiDx Implementation Roadmap

A phased approach to integrate ConfiDx, ensuring a smooth transition and maximizing impact.

Phase 1: Pilot & Customization (2-4 Weeks)

Initial setup, data integration from existing EHRs (MIMIC-IV, UMN-CDR-like data), and fine-tuning ConfiDx to your specific organizational diagnostic criteria and clinical workflows. Includes a small-scale pilot with a selected clinical department (e.g., Hepatology, Cardiology, Emergency).

Deliverables: Configured ConfiDx instance, Customized diagnostic criteria integration, Pilot deployment & initial user feedback

Phase 2: Expanded Integration & Training (4-8 Weeks)

Expand ConfiDx to additional departments. Comprehensive training for clinicians and IT staff on interpreting ConfiDx outputs, understanding uncertainty explanations, and leveraging AI-augmented decision support. Establish feedback loops for continuous model improvement.

Deliverables: Expanded departmental deployment, Clinician training program, Continuous feedback mechanism

Phase 3: Full Deployment & Optimization (8-12 Weeks)

Full enterprise-wide deployment of ConfiDx. Ongoing monitoring of diagnostic accuracy, uncertainty recognition, and clinician collaboration. Iterative optimization based on performance data and evolving clinical guidelines, ensuring long-term value.

Deliverables: Enterprise-wide ConfiDx deployment, Performance monitoring dashboard, Regular model updates & optimization

Schedule Your ConfiDx Strategy Session Today

Unlock the power of uncertainty-aware AI in your clinical practice. Book a free consultation with our experts to discuss how ConfiDx can enhance diagnostic accuracy, reduce misdiagnosis risk, and empower your clinicians.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking