Skip to main content
Enterprise AI Analysis: LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems

Enterprise AI Analysis

LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems

This paper introduces LEC, a novel framework for False Discovery Rate (FDR) control in selective prediction and routing systems based on linear expectation constraints. It enables rigorous statistical guarantees for point predictions from large language models (LLMs). LEC reinterprets selective prediction as a constrained decision problem, deriving finite-sample conditions for calibrated thresholds. The framework extends to multi-model routing, significantly improving acceptance coverage and maintaining statistical reliability. Empirical evaluations demonstrate tighter FDR control and higher sample retention compared to prior methods, highlighting its practical effectiveness for risk-aware AI systems.

Executive Impact Summary

Leveraging LEC for risk-controlled AI deployments translates directly into significant operational benefits and enhanced reliability.

0 Higher Sample Acceptance
0 Target Risk Level Maintained
0 Accepted Correct Samples

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction
Methods
Experiments
Routing Extension
Conclusion

Overview of the problem and LEC's solution.

This section sets the stage by highlighting the challenge of unreliable Large Language Model (LLM) outputs and the limitations of existing uncertainty quantification methods. It introduces LEC as a principled framework for False Discovery Rate (FDR) control, ensuring statistical guarantees for accepted predictions. The core problem addressed is preventing erroneous LLM outputs from being accepted without robust risk management.

Detailed explanation of LEC's statistical framework.

LEC reinterprets selective prediction as a constrained decision problem. It enforces a Linear Expectation Constraint over selection and error indicators, allowing for the derivation of a finite-sample sufficient condition. This condition, derived from a held-out calibration set, computes an FDR-constrained, coverage-maximizing threshold. The methodology is extended to two-model routing systems, where thresholds are jointly calibrated under a unified FDR guarantee.

Empirical validation on QA datasets and LLMs.

Empirical evaluations on both closed-ended and open-ended question-answering (QA) datasets demonstrate LEC's effectiveness. Results show that LEC consistently achieves tighter FDR control and substantially improves sample retention compared to prior methods. The experiments also validate LEC's statistical validity and higher power across various LLMs, UQ methods, and black-box settings.

How LEC adapts to multi-model systems.

The framework extends seamlessly to multi-model routing, enabling a hierarchical decision-making process. If a primary model's uncertainty is too high, the query is routed to a stronger model. This strategy maintains system-level FDR control while accepting more correct samples than individual models, significantly improving overall system efficiency and reliability in complex agentic workflows.

Summary of contributions and future work.

This section reiterates LEC's role as a general foundation for risk-controlled deployment of LLMs. It summarizes the key findings: strict FDR control, tighter calibration, and improved power. Future work includes extending the framework to task-specific risk metrics and richer routing architectures, further supporting reliable decision-making in advanced AI systems.

97.85% LEC Power at 0.4 Alpha (Vicuna-7B-V1.5)

Enterprise Process Flow: Single-Model Selective Prediction

Input Prompt
Model Generates Prediction & Uncertainty
Compare Uncertainty to Calibrated Threshold
Accept Prediction (if within threshold)
Delegate/Abstain (if outside threshold)

LEC vs. Baseline Methods (FDR Control & Power)

Feature LEC COIN-CP COIN-HFD
FDR Control
  • Strictly controls FDR below α
  • PAC-style control (less strict)
  • PAC-style control (more conservative)
Sample Retention (Power)
  • Higher power/acceptance rates
  • Achieves +8% over COIN-CP on LLaMA-3.2-3B
  • Lower power
  • Conservative acceptance
  • Lowest power (most abstention)
  • Most conservative acceptance
Calibration Tightness
  • Tighter calibration
  • Optimal thresholding
  • Overly conservative bounds
  • Upper confidence bounds
  • Most conservative bounds
  • Upper confidence bounds

Real-world Impact: Multi-Model Routing for Enhanced Reliability

LEC's multi-model routing capability allows organizations to create robust AI decision systems. By dynamically delegating uncertain predictions from a primary LLM to a more capable secondary model, the system can achieve lower overall risk levels while significantly increasing the total number of accepted correct samples. This intelligent routing ensures that valuable insights are not lost due to conservative abstention by individual models, leading to more efficient and trustworthy AI deployments in critical applications like medical diagnostics or financial analysis. For instance, in our evaluations, a two-model system accepted over over 100 more correct samples at a 0.15 risk level than the individual baseline model, while maintaining strict FDR control.

Calculate Your Potential ROI with LEC

Estimate the tangible savings and reclaimed productivity by implementing LEC's risk-controlled AI systems in your enterprise.

Estimated Annual Savings $0
Productive Hours Reclaimed Annually 0

Your Path to Trustworthy AI

A typical implementation journey for integrating LEC into your enterprise AI systems.

Discovery & Assessment

Identify critical AI applications, assess current uncertainty methods, and define target risk levels for selective prediction and routing. Establish baseline performance metrics.

Data Preparation & Calibration

Prepare dedicated calibration datasets for each LLM. Collect uncertainty scores and ground-truth labels to train and calibrate LEC thresholds for desired FDR control.

Model Integration & Routing Logic

Integrate LEC into your existing LLM inference pipelines. Configure single-model selective prediction and multi-model routing policies based on calibrated thresholds.

Validation & Deployment

Rigorously validate the system's FDR control and coverage on unseen test data. Deploy the LEC-enhanced AI system with confidence, ensuring statistical guarantees in production.

Continuous Monitoring & Refinement

Implement ongoing monitoring of system performance, FDR, and acceptance rates. Periodically recalibrate thresholds as models evolve or data distributions shift to maintain optimal reliability.

Ready to Build Trustworthy AI?

Schedule a complimentary consultation to explore how LEC can elevate the reliability and efficiency of your enterprise AI applications.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking