Skip to main content
Enterprise AI Analysis: A Systematic Literature Review on Large Language Models for Automated Program Repair

Enterprise AI Analysis

A Systematic Literature Review on Large Language Models for Automated Program Repair

This comprehensive review explores the applications of Large Language Models (LLMs) in Automated Program Repair (APR), analyzing 189 papers from 2020-2025. It identifies key trends, popular LLMs, utilization strategies like fine-tuning and zero-shot prompting, and repair scenarios including semantic bugs and security vulnerabilities. The review also addresses integration factors like datasets and input forms, and highlights challenges such as data leakage and computational cost, offering guidelines for future research. Overall, LLMs are significantly advancing APR, moving towards more scalable, flexible, and autonomous repair workflows.

Executive Impact: Key Findings at a Glance

Our analysis reveals critical trends and opportunities for leveraging AI in software engineering.

0 Papers Analyzed
0 Unique LLMs
0 Programming Languages
0 Semantic Bug Focus

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

LLM Architectures
Adaptation Strategies

Encoder-only LLMs

These models utilize only the encoder stack of the Transformer architecture, pre-trained with Masked Language Modeling (MLM). They are suited for code understanding tasks like code search, but require a separate decoder for patch generation. Examples include CodeBERT and GraphCodeBERT. [54, 71]

Encoder-decoder LLMs

These models use both encoder and decoder stacks, making them suitable for sequence-to-sequence generation tasks. They treat program repair as a neural machine translation task. Examples include CodeT5 and PLBART. [238, 2]

Decoder-only LLMs

The most popular category, pre-trained using causal language modeling (CLM) to predict the next token autoregressively. They are highly effective for zero-shot and few-shot repair, often from large-scale unlabeled code corpora. Examples include GPT-3.5, GPT-4, and CodeLLaMA. [179, 180, 198]

Fine-tuning

Further training LLMs on smaller, task-specific datasets to adjust weights and biases for program repair. Effective for early LLMs like T5 and CodeT5 to significantly improve performance on target tasks without training from scratch. [15, 59]

Few-shot Prompting

Utilizing LLMs to learn new tasks from a limited number of examples within the input prompt itself. This allows LLMs to generalize from very limited data, often by providing similar repair demonstrations or project-specific coding styles. [170, 254]

Zero-shot Prompting

Requires LLMs to perform program repair without any explicit examples, relying on pre-existing knowledge from massive pre-training. This can be cloze-style repair (filling masked code) or conversational-based repair (dialogue with LLMs). [255, 188]

Agent-based Repair

Autonomous systems where LLMs perceive environment, reason, and take actions with external tools (compilers, debuggers) to achieve repair goals. This allows dynamic adaptation and iterative refinement, moving towards closed-loop repair workflows. [19, 121]

Enterprise Process Flow: Automated Program Repair Workflow

Buggy Program
Fault Localization
Patch Generation
Patch Validation
Correct Program

Key Insight: Dominant Programming Language

35.45% Java accounts for over a third of LLM-based APR studies, highlighting its importance in the field.

Comparison: LLM-based APR vs. Traditional APR

Feature LLM-based APR Traditional APR
Scope
  • Broader range of programming languages, including rare ones (Verilog, Rust).
  • Typically limited to 5 common languages (Java, JavaScript, Python, C, C++).
Knowledge Source
  • General programming knowledge from diverse, massive datasets; robust natural language understanding.
  • Relies on pre-defined repair templates or learned patterns from extensive but limited code corpora.
Learning Samples
  • Effective in few-shot or zero-shot settings with limited learning samples.
  • Requires extensive repair corpora for training DL models.
Performance
  • Demonstrates significantly better performance and growing attention.
  • Achieved remarkable performance by learning hidden bug-fixing patterns automatically.

Case Study: Repository-level Repair with SWE-bench

Context: Repository-level issues are real-world problems in large-scale repositories, requiring long-context understanding, multi-file edits, and robust validation.

Challenge: Traditional APR often relies on predefined conditions (known bug locations, single-function repair), which fall short for complex, real-world issues.

LLM Solution: LLM-based agents, like those evaluated on SWE-bench, can navigate repositories, localize faults from informal reports, edit multiple files, and validate patches. They use agent-computer interfaces and AST representations for holistic understanding.

Impact: This approach extends APR to realistic GitHub issues, enabling autonomous reasoning and end-to-end repair across full repositories, demonstrating a significant leap in tackling complex software engineering challenges.

Key Insight: Semantic Bug Dominance

42.93% of research volume on LLM-based APR is dedicated to semantic bugs.

Advanced ROI Calculator

Estimate your potential annual savings and reclaimed developer hours with AI-powered program repair.

Estimated Annual Savings
Developer Hours Reclaimed

Your AI Program Repair Roadmap

A phased approach to integrating LLM-based Automated Program Repair into your enterprise.

Phase 1: Pilot Program & Data Preparation

Initiate a pilot project with a small team. Identify a specific bug type or repair scenario. Curate and clean a relevant dataset for fine-tuning or prompt engineering.

Phase 2: LLM Customization & Integration

Select appropriate LLM architectures (e.g., Decoder-only for zero-shot). Implement fine-tuning or design sophisticated prompting strategies. Integrate LLMs with existing debugging tools and test suites.

Phase 3: Iterative Testing & Feedback Loop

Deploy the LLM-powered APR system in a controlled environment. Collect feedback on patch correctness and quality. Refine LLM models or prompting based on real-world performance and developer input.

Phase 4: Scaled Deployment & Monitoring

Expand the deployment to broader development teams. Establish continuous monitoring for performance, cost-efficiency, and user satisfaction. Incorporate new bug types and programming languages as capabilities evolve.

Ready to Transform Your Software Development Workflow?

Unlock the full potential of AI-driven program repair for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking