AI IN REQUIREMENTS ENGINEERING
Revolutionizing Requirements Elicitation with LLM-Powered Voice Agents
This report explores the development and pre-testing of voice-based agentic workflows using OpenAI's GPT-4o-mini and Google's Gemma3:27b for requirement elicitation in software projects. We evaluate their performance, usability, and conversational flow, highlighting key differences and future directions for human-AI collaboration.
Executive Impact
Early prototypes demonstrate the significant potential of LLM-based voice agents to enhance requirements elicitation efficiency and user experience.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
End-to-End Voice Agent Processing Flow
| Component | OpenAI Agent | Gemma Agent |
|---|---|---|
| STT | Cloud (Whisper API) | Local (faster-whisper) |
| LLM | Cloud hosted | Local model |
| TTS | Cloud based | Local engine |
| Privacy | External processing | Fully local |
| Latency | Network dependent | Hardware dependent |
| Setup | Lightweight | Resource intensive |
The OpenAI-based voice agent, utilizing GPT-4o-mini, demonstrated strong performance in requirement elicitation. It achieved an average requirement coverage of 77.5% across five participants and two case studies, indicating its effectiveness in identifying relevant functional features and constraints. Usability ratings averaged 4.0 out of 5, with participants noting a natural conversational flow, better contextual understanding, and improved responsiveness. The agent's questioning and clarification style was smooth, often leading the elicitation process effectively.
Challenges included limited customization of TTS functionality (e.g., speaking rate, pitch) and opaque debugging within the SDK. Despite these, its ability to maintain context and produce coherent follow-up questions was a significant advantage.
| Participant | Gold Requirements | Captured | Coverage (%) |
|---|---|---|---|
| LK1 | 8 | 6 | 75.0 |
| LK4 | 8 | 7 | 87.5 |
| LK5 | 8 | 5 | 62.5 |
| LK7 | 8 | 5 | 62.5 |
| LK10 | 8 | 8 | 100.0 |
| Average | - | - | 77.5 |
The Gemma-based agent, powered by Gemma3:27b, showed lower performance compared to the OpenAI agent, with an average requirement coverage of 35.0%. Usability ratings averaged 3.3 out of 5, reflecting mixed conversational performance. Participants reported difficulties such as repeated or redundant questions and a loss of conversational context. The agent sometimes forgot previous responses and struggled with consistent memory handling.
Challenges included initial playback issues with pyttsx3 (resolved by macOS system voice), and delays between text display and audio response. While it could ask clarifying questions, its conversational flow was often perceived as rigid and repetitive.
| Participant | Gold Requirements | Captured | Coverage (%) |
|---|---|---|---|
| LK2 | 8 | 2 | 25.0 |
| LK3 | 8 | 3 | 37.5 |
| LK6 | 8 | 3 | 37.5 |
| LK8 | 8 | 3 | 37.5 |
| LK9 | 8 | 3 | 37.5 |
| Average | - | - | 35.0 |
Future work will focus on integrating more capable open-source models like Qwen and refining prompt engineering for improved performance. The ultimate objective is to deploy this voice agent as the first in a multi-agent requirements engineering workflow, collaborating with human engineers.
We plan to explore multimodal inputs (images, text files) to enrich interactions and employ advanced prompt engineering to mitigate hallucinations. A wider sample size for user studies and additional metrics like task completion time, error rate, and hallucination frequency will provide more robust empirical feedback.
Case Study Example: LaundryMart App
The paper includes two case studies. Case study 1 involves developing a mobile application for a chain of self-service laundromats. Users should be able to locate branches, view machine availability, reserve machines, create accounts, load credit, start machines via QR code, receive completion notifications, report issues, and access a loyalty system. Managers need a dashboard for monitoring usage, service requests, and promotions.
- Locate nearby branches and view real-time washer/dryer availability.
- Reserve machines in advance.
- Create accounts, load credit, and start machines via QR code.
- Receive estimated wash/dry completion times with push notifications.
- Report machine issues with options for photos/comments.
- Loyalty system for discounts/free cycles.
- Admin panel for promotional notifications and usage history.
- Manager dashboard for monitoring usage patterns and service requests.
Calculate Your Potential AI ROI
Estimate the efficiency gains and cost savings your enterprise could realize by implementing AI-powered solutions.
Implementation Roadmap
Our phased approach ensures a smooth transition and continuous improvement, delivering tangible value at each step.
Phase 1: Advanced Model Integration
Integrate more capable open-source LLMs like Qwen and refine prompt engineering for improved performance and reduced hallucinations.
Phase 2: Multimodal Input Expansion
Extend agent capabilities to process multimodal inputs such as images and text files, enabling richer, context-aware interactions.
Phase 3: Large-Scale Empirical Validation
Conduct wider user studies with diverse participants, evaluating effectiveness, user satisfaction, task completion time, error rate, and hallucination frequency.
Phase 4: Multi-Agent Workflow Integration
Deploy the voice agent as the initial component in an AI-driven multi-agent requirements engineering workflow, collaborating with human engineers.
Ready to Transform Your Requirements Engineering?
Unlock the power of conversational AI for your enterprise. Our team of experts is ready to help you design and implement intelligent solutions that streamline your workflows and enhance user experiences. Don't miss out on the future of requirements engineering. Reach out today!