Software Engineering & AI

PerfBench: Can Agents Resolve Real-World Performance Bugs?

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. This paper introduces PerfBench, a benchmark comprising 81 real-world performance bug-fixing tasks, revealing that state-of-the-art coding agents struggle significantly but can be improved with performance-aware tooling and instructions.

Schedule Your AI Strategy Session

Quantifiable Impact & Key Findings

Our research uncovers critical insights into AI agent capabilities for performance optimization, highlighting both challenges and significant potential for improvement.

0% Baseline Agent Success Rate

0% Perf-Agent Success Rate

0X Performance Improvement

0 Real-World Perf Benchmarks

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Background

PerfBench Construction

Experimental Setup

Benchmark Results

Background and Related Work

Performance bugs are a unique class of software defects that impact efficiency without causing functional failures. They tend to be harder to detect and fix than functional bugs. Recent advances in Software Engineering agents have shown promise in automated bug fixing, but existing benchmarks primarily focus on functional correctness, leaving a significant gap in understanding how well these agents handle non-functional bugs such as performance or security issues.

PerfBench Construction

PerfBench is a benchmark specifically designed to evaluate software engineering agents on performance bug fixing tasks in .NET applications. It comprises 81 carefully curated and manually verified tasks from popular open-source .NET repositories on GitHub, each representing a real performance issue fixed by developers. The benchmark features a novel evaluation harness for agent-generated performance benchmarks and validates fixes by comparing execution metrics.

Experimental Setup

The evaluation harness automates the entire testing process, using agent-generated BenchmarkDotNet tests. We execute agent-written tests before and after changes, along with existing unit tests. Metrics include Success Rate, Performance Improvement (%, kbs, ms), Token Usage, Steps Taken, and Dollar Cost. We evaluated OpenHands agents in baseline and performance-aware configurations with GPT-4.1 and Claude Sonnet 4.

Benchmark Results

Our evaluation reveals that current state-of-the-art coding agents struggle with performance optimization tasks. The baseline OpenHands agent achieves only a ~3% success rate. However, by developing OpenHands-Perf-Agent, which incorporates performance-aware tooling and instructions, we achieved a ~20% success rate, demonstrating a substantial improvement and the potential for targeted approaches.

Enterprise AI Agent Workflow for Performance Optimization

Identify Performance Issue

→

Generate Benchmarks & Diagnostics

→

Apply Code Fixes

→

Validate Performance Improvement

→

Ensure Functional Correctness

~3% Baseline Success Rate for Performance Bugs

Agent Performance Comparison on PerfBench

Feature	Baseline OpenHands (GPT-4.1)	OpenHands-Perf-Agent (GPT-4.1)
Success Rate	1.2% (1/81)	14.8% (12/81)
Avg Steps	47.2	84.3
Avg Tokens	1.3M	1.9M
Key Strengths	✓ General functional bug fixing	✓ Performance-aware instructions ✓ Benchmarking tooling integration ✓ Improved success rate on PerfBench

Case Study: Optimizing Memory Allocation in .NET

One critical finding from PerfBench highlights the prevalence of memory management issues, accounting for over 40% of all performance bugs. Our OpenHands-Perf-Agent demonstrated a 18.2% success rate in this category, compared to 6.0% for the baseline agent.

This improvement was achieved by explicitly guiding the agent to use BenchmarkDotNet with MemoryDiagnoser, allowing it to identify and resolve excessive allocations. For instance, in one task, the agent successfully refactored a collection initialization to prevent an OutOfMemoryException, leading to significant memory savings.

Discuss Your Implementation

Calculate Your Potential AI ROI

Estimate the significant time and cost savings your enterprise could achieve by implementing intelligent automation.

Your Industry

Number of Employees Impacted

Avg. Hours/Week on Manual Tasks per Employee

Avg. Hourly Wage ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Unlock Your Enterprise AI ROI

Your Enterprise AI Implementation Roadmap

A strategic, phased approach ensures seamless integration and maximum value realization for your organization.

Discovery & Strategy

Comprehensive assessment of current workflows, identification of high-impact AI opportunities, and development of a tailored AI strategy aligned with your business objectives.

Pilot & Prototyping

Rapid development and deployment of a proof-of-concept or pilot AI solution to validate feasibility, measure initial impact, and gather feedback for refinement.

Integration & Scaling

Seamless integration of AI solutions into existing enterprise systems, scaling across departments, and ensuring robust performance and security at scale.

Optimization & Governance

Continuous monitoring, performance optimization, model fine-tuning, and establishing AI governance frameworks for sustained value and responsible AI practices.

Ready to Transform Your Enterprise with AI?

Partner with us to leverage cutting-edge AI for unparalleled efficiency, innovation, and competitive advantage.

Book Your Free AI Consultation

Software Engineering & AI

PerfBench: Can Agents Resolve Real-World Performance Bugs?

Quantifiable Impact & Key Findings

Deep Analysis & Enterprise Applications

Background and Related Work

PerfBench Construction

Experimental Setup

Benchmark Results

Enterprise AI Agent Workflow for Performance Optimization

Agent Performance Comparison on PerfBench

Case Study: Optimizing Memory Allocation in .NET

Calculate Your Potential AI ROI

Your Enterprise AI Implementation Roadmap

Discovery & Strategy

Pilot & Prototyping

Integration & Scaling

Optimization & Governance

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai