Uncertainty-aware large language models for explainable disease diagnosis

Empowering Clinical Decisions with AI-Driven Diagnostic Certainty

This npj Digital Medicine study introduces ConfiDx, a groundbreaking uncertainty-aware large language model (LLM) fine-tuned with diagnostic criteria. ConfiDx explicitly identifies and explains diagnostic uncertainty, a critical yet underserved aspect of AI-driven medical diagnosis, enhancing trustworthiness and reducing misdiagnosis risk. It significantly outperforms traditional LLMs in diagnostic accuracy and the ability to articulate why a diagnosis might be uncertain, a crucial capability in complex clinical scenarios like those in primary care or ICU settings.

Schedule Your Strategy Session

Executive Impact: Quantifiable Advancements in AI Diagnostics

ConfiDx's impact on diagnostic accuracy and uncertainty recognition is profound, demonstrating significant advancements over existing LLM-based systems.

0 Improvement in Uncertainty Recognition for Experts

0 Improvement in Uncertainty Explanation for Experts

0 Diagnostic Accuracy Improvement over Baselines

0 Highest Uncertainty Recognition Score (AccuracyEU)

Discuss Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

ConfiDx: A Novel Approach to Uncertainty-Aware Diagnosis

ConfiDx is an uncertainty-aware large language model fine-tuned with diagnostic criteria to identify and explain diagnostic uncertainty. The approach formalizes uncertainty-aware diagnosis and leverages richly annotated datasets reflecting diagnostic ambiguity. This model significantly improves diagnostic performance and generates trustworthy explanations for both diagnoses and uncertainties.

Explore ConfiDx Capabilities

Quantifying ConfiDx's Superiority

Evaluations on real-world datasets showed ConfiDx excelled in identifying diagnostic uncertainties, achieving superior diagnostic performance, and generating trustworthy explanations. Automated metrics like Diagnostic Accuracy, Interpret. Accuracy, BERTScore, METEOR, AccuracyEU, and FEU were used, complemented by expert manual assessments for correctness and completeness.

View Performance Benchmarks

Augmenting Human Expertise with AI

ConfiDx-assisted experts outperformed standalone experts by 10.7% in uncertainty recognition and 26% in uncertainty explanation, highlighting its substantial potential to improve clinical decision-making. This human-AI collaboration enhances diagnostic interpretation and recognition of uncertainty.

See Clinical Integration

Enterprise Process Flow

Data Acquisition (MIMIC-IV, UMN-CDR, PMC)

→

Multi-agent Data Annotation & Expert Verification

→

Open-source LLM Selection (70B params)

→

Instruction Fine-tuning with Diagnostic Criteria

→

Performance Evaluation (MIMIC, MIMIC-U, UMN-CDR, PMC, NEJM)

→

Human-AI Collaboration Assessment

68.3%+ Improvement in Diagnostic Accuracy by ConfiDx over off-the-shelf LLMs, demonstrating its superior ability to provide reliable diagnoses.

ConfiDx's fine-tuning with diagnostic criteria significantly enhances its ability to distinguish between similar conditions and make accurate predictions, addressing a key limitation of general-purpose LLMs.

ConfiDx vs. Off-the-Shelf LLMs: Key Differentiators

Feature	Off-the-Shelf LLMs	ConfiDx (Fine-tuned LLMs)
Diagnostic Accuracy	Limited capability (0.197-0.218)	Superior, 68.3%+ improvement
Uncertainty Recognition	Struggles, overconfidence (0.057-0.102 AccuracyEU)	Excels, 0.594-0.658 AccuracyEU
Explanation Quality	Lacks narrative, misaligned with criteria	Trustworthy, comprehensive, criterion-aligned
Robustness (Unseen Diseases)	Poor (0.263-0.294 Accuracy)	Fair (0.471-0.497 AccuracyEU)
Generalizability (Cross-Institute)	Limited	Superior (0.497-0.569 AccuracyEU)
Clinical Alignment	Not aligned with professional preferences	Rigorous adherence to diagnostic criteria

ConfiDx in Action: Identifying and Explaining Diagnostic Uncertainty

Scenario: A 63-year-old male presents with dyspnea, orthopnea, and lower extremity swelling. BNP is elevated at 550 pg/mL, echocardiogram reveals reduced ejection fraction (30%). Family history includes heart failure. However, a definitive coronary angiography finding is absent.

Baseline LLM Output:
Diagnosis: Severe metabolic acidosis.
Explanation: pH-7.20, HCO3-12, AnGap-27, Base XS--13, Lactate-11.4.
Uncertainty Label: Sufficient information (Confident diagnosis).
Uncertainty Explanation: None.

ConfiDx Output:
Diagnosis: Acute liver failure (initial, then re-evaluated for uncertainty)
Explanation: Head CT showed diffuse cerebral edema, INR-1.6, BLOOD ALT-2705, AST-4966, TotBili-1.6, developed symptoms for less than 7 weeks.
Uncertainty Label: Insufficient information (Diagnostic uncertainty).
Uncertainty Explanation: Insufficient evidence regarding 'No prior history of cirrhosis' (key diagnostic criterion for acute liver failure).

Analysis: In this case, the baseline LLM incorrectly diagnosed 'Severe metabolic acidosis' and failed to recognize any uncertainty. ConfiDx, however, initially considered 'Acute liver failure' but crucially identified and explained the diagnostic uncertainty based on the unmet criterion of 'no prior history of cirrhosis'. This demonstrates ConfiDx's ability to prevent misdiagnosis by flagging missing critical evidence.

Calculate Your Potential ROI with ConfiDx

Estimate the potential return on investment for integrating ConfiDx into your enterprise diagnostic workflow. Adjust the parameters to reflect your organization's specific context and see the projected annual savings and reclaimed clinician hours.

Industry

Number of Employees Relying on Diagnostics

Average Weekly Hours Spent on Diagnosis/Review per Employee

Average Hourly Rate of Diagnostic Professionals ($)

Estimated Annual Savings

Annual Clinician Hours Reclaimed

Your ConfiDx Implementation Roadmap

A phased approach to integrate ConfiDx, ensuring a smooth transition and maximizing impact.

Phase 1: Pilot & Customization (2-4 Weeks)

Initial setup, data integration from existing EHRs (MIMIC-IV, UMN-CDR-like data), and fine-tuning ConfiDx to your specific organizational diagnostic criteria and clinical workflows. Includes a small-scale pilot with a selected clinical department (e.g., Hepatology, Cardiology, Emergency).

Deliverables: Configured ConfiDx instance, Customized diagnostic criteria integration, Pilot deployment & initial user feedback

Phase 2: Expanded Integration & Training (4-8 Weeks)

Expand ConfiDx to additional departments. Comprehensive training for clinicians and IT staff on interpreting ConfiDx outputs, understanding uncertainty explanations, and leveraging AI-augmented decision support. Establish feedback loops for continuous model improvement.

Deliverables: Expanded departmental deployment, Clinician training program, Continuous feedback mechanism

Phase 3: Full Deployment & Optimization (8-12 Weeks)

Full enterprise-wide deployment of ConfiDx. Ongoing monitoring of diagnostic accuracy, uncertainty recognition, and clinician collaboration. Iterative optimization based on performance data and evolving clinical guidelines, ensuring long-term value.

Deliverables: Enterprise-wide ConfiDx deployment, Performance monitoring dashboard, Regular model updates & optimization

Begin Your AI Transformation

Schedule Your ConfiDx Strategy Session Today

Unlock the power of uncertainty-aware AI in your clinical practice. Book a free consultation with our experts to discuss how ConfiDx can enhance diagnostic accuracy, reduce misdiagnosis risk, and empower your clinicians.

Book Your Free Consultation

Uncertainty-aware large language models for explainable disease diagnosis

Empowering Clinical Decisions with AI-Driven Diagnostic Certainty

Executive Impact: Quantifiable Advancements in AI Diagnostics

Deep Analysis & Enterprise Applications

ConfiDx: A Novel Approach to Uncertainty-Aware Diagnosis

Quantifying ConfiDx's Superiority

Augmenting Human Expertise with AI

Enterprise Process Flow

ConfiDx vs. Off-the-Shelf LLMs: Key Differentiators

ConfiDx in Action: Identifying and Explaining Diagnostic Uncertainty

Calculate Your Potential ROI with ConfiDx

Your ConfiDx Implementation Roadmap

Phase 1: Pilot & Customization (2-4 Weeks)

Phase 2: Expanded Integration & Training (4-8 Weeks)

Phase 3: Full Deployment & Optimization (8-12 Weeks)

Schedule Your ConfiDx Strategy Session Today

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai