Skip to main content
Enterprise AI Analysis: ChatGPT as a tool for reviewing multiple-choice questions in the health sector

Enterprise AI Analysis

ChatGPT as a tool for reviewing multiple-choice questions in the health sector

This study evaluates the capacity of ChatGPT-4 to enhance the quality of multiple-choice questions (MCQs) in medical education, comparing AI-revised versions with faculty-authored originals against 38 rigorous criteria. Findings reveal AI's proficiency in structural clarity but limitations in higher-order cognitive assessment, underscoring the critical role of human expertise and prompt engineering.

Authors: Tatiane Iembo, Helena Landim Gonçalves Cristóvão, Patrícia Carla Zanelatto Gonçalves, Wagner Ricardo Montor, Patrícia Silva Fucuta, Toufic Anbar Neto, Júlio César André & Milton Arruda Martins

Source: Scientific Reports (2026), Published Online: 13 May 2026

Executive Impact: AI in Medical Education Assessment

Addressing persistent challenges in MCQ quality, this analysis demonstrates how AI can augment human efforts in assessment development, particularly in high-stakes environments like the Progress Test. Understanding AI's specific strengths and weaknesses is key to its effective deployment.

0 Median Criteria Met (AI-Reviewed)
0 Median Criteria Met (Faculty-Authored)
0 P-value (External Evaluators)
0 P-value (Study Authors)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

% Increase in Median Criteria Met (Study Authors)

The study authors, after standardization, found a statistically significant increase in the number of criteria met by ChatGPT-4-reviewed MCQs, indicating AI's capacity to refine questions. External evaluators, without standardization, did not find a significant difference.

Aspect Faculty-Authored MCQs ChatGPT-4 Reviewed MCQs
Structural Clarity
  • Often had issues with ambiguous stems, flawed distractors, and unnecessary data.
  • Demonstrated proficiency in modifying questions for greater structural clarity and adherence to basic item-writing principles, resulting in increased clarity and objectivity.
Higher-Order Thinking
  • Challenges in crafting items that accurately measure complex cognitive skills; often assessed memorization.
  • Struggled to incorporate clinical reasoning and higher-order thinking when these were lacking, particularly with non-optimized prompts.
Prompt Engineering
  • Quality highly dependent on individual faculty's item-writing expertise and adherence to guidelines.
  • Quality highly dependent on effective prompt engineering; simple, non-optimized prompts limited AI's full potential.
Review Consistency
  • Variability in individual perceptions of quality among uncalibrated external evaluators.
  • Standardization meetings were crucial for study authors to achieve consistent interpretation and sensitive detection of AI-introduced changes.

Enterprise Process Flow

36 MCQs originally created by medical faculty
MCQs translated to English by bilingual medical professional
ChatGPT-4 reviewed MCQs with a single, comprehensive prompt (38 criteria)
10 external health education specialists assessed both versions (blinded)
4 study authors re-evaluated both versions (blinded & standardized)
Statistical analysis: Wilcoxon Signed-Rank Test & NMDS
Comparison of criteria met & quality assessment

ChatGPT-4's Role in MCQ Review Process

ChatGPT-4 was accessed via its website and individual questions were submitted with a single, comprehensive prompt. This prompt instructed the model to act as a "medical undergraduate professor" and review/reformulate MCQs based on all 38 specified construction criteria (e.g., good English, independent alternatives, higher-level reasoning, clear phrasing, no unnecessary data, etc.). The intention was to simulate a common faculty request without specialized prompt engineering training, ensuring a standardized approach to AI review.

Strategic Integration of AI in Medical Education Assessment

The findings emphasize that AI, particularly ChatGPT-4, serves as a powerful complementary tool, not a replacement for human expertise. It excels in refining structural clarity and basic item-writing principles, freeing up faculty time. However, human oversight remains critical for assessing higher-order cognitive skills and nuanced problem-solving. Effective prompt engineering and continuous faculty training are essential to maximize AI's potential and ensure pedagogical depth in assessments like the Progress Test.

AI’s ability to conduct complex analysis in a shorter time frame can significantly boost efficiency for educators, allowing them to focus on more intricate aspects of assessment development and student engagement.

Acknowledged Limitations & Future Directions

Limitations: The study's cross-sectional design prevents establishing causality or "improvement" longitudinally. The constrained sample size (36 MCQs) limits generalizability. Evaluator homogeneity and the deliberate use of a non-optimized prompt might underestimate AI's full potential, especially for higher-order cognitive skills. The specific context of the Progress Test consortium may also limit direct generalizability.

Future Research: Future studies should aim for a more diverse range of questions and evaluators. Crucially, exploring the use of ChatGPT-4 to create questions based on structured instructions and optimized prompts could reveal its full capabilities. Longitudinal studies assessing AI-assisted revision on student performance and learning outcomes are also warranted.

Calculate Your Potential AI Impact

Estimate the hours and cost savings your enterprise could achieve by integrating AI into routine knowledge management and content generation workflows.

Estimated Annual Cost Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach ensures successful integration and maximum return on your AI investment. Here’s a typical journey.

Phase 1: Discovery & Strategy

Conduct a comprehensive audit of existing workflows, identify high-impact AI opportunities, and define clear objectives and KPIs. Develop a tailored AI strategy aligned with enterprise goals.

Phase 2: Pilot & Proof-of-Concept

Implement AI solutions in a controlled environment, validate effectiveness against defined metrics, and gather feedback for iterative refinement. Demonstrate tangible value with a successful pilot project.

Phase 3: Integration & Scaling

Seamlessly integrate AI tools into your existing technology stack. Develop training programs for your team to maximize adoption and ensure smooth operational scaling across departments.

Phase 4: Optimization & Future-Proofing

Continuously monitor AI performance, fine-tune models, and explore advanced capabilities to sustain competitive advantage. Establish governance and ethical AI frameworks for long-term success.

Ready to Transform Your Enterprise with AI?

Don't miss out on the competitive edge AI can offer. Schedule a personalized consultation to discuss how these insights apply to your specific needs and challenges.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking