AI-POWERED PERCUSSION TUTORING
Poster: A Multi-Modal Autonomous Tutoring System for Indian Percussion
Mridangam and Tabla are key percussion instruments in Indian classical music. This poster presents our broad goal of achieving a fully autonomous percussion tutoring system that uses multi-sensor fusion (camera, audio, motion and haptic - force or pressure). In this manuscript we present our proposed design and preliminary explorations for mridangam and discuss how it can translate to tabla analysis.
Authors: VIGNESH A. M. RAJA, PRATEEK PRASANNA, ANU BOURGEOIS, ASHWIN ASHOK
Published: 02 March 2026
DOI: 10.1145/3789514.3796243
Key Performance Indicators & Cultural Impact
Our system demonstrates significant advancements in automating the instruction of complex percussion, aiming to preserve rich cultural heritage through accessible digital tools and robust performance analysis.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Introduction and System Architecture
Mridangam and tabla are unique percussion instruments that feature complex finger placements and damping techniques executed at low (2-4 strokes/sec) to high (8-16 strokes/sec) speeds. Unlike isolated western drumming, mridangam and tabla strokes involve simultaneous multi-finger configurations where visually similar positions produce acoustically distinct sounds based on subtle pressure differences. Our proposed autonomous tutoring system (Figure 1) comprises three tightly-coupled subsystems; Audio Onset Detection uses spectral analysis via Librosa [2] for beat onset time-stamping and beat classification, Spatial Segmentation employing SAM2 [3] for zero-shot segmentation of the drum's inner and outer membranes, with RANSAC-based circle fitting to maintain accuracy under occlusion, Hand Pose using MediaPipe [1] that extracts finger landmarks and projects them onto segmented zones, and a rule-based classifier that maps finger configurations to different strokes.
Enterprise Process Flow
Challenges and Cultural Impact
Digitizing mridangam performance presents unique challenges: capturing Micro-Movement Nuances like sliding motions (Gumki) that standard vision models miss, handling High-Speed Occlusion where hands block the drum face, and resolving Acoustic Ambiguity where identical regions produce different timbres. Solving these enables a broader impact: Cultural Preservation. By digitizing the pedagogical process itself, we ensure that the intricate mechanics of oral traditions can be passed to future generations even if human masters become inaccessible. Furthermore, the framework's modular design allows it to be translated to other instruments like the Tabla, creating a scalable platform for archiving global percussion heritage.
Preliminary Evaluation
We have evaluated the feasibility of our approach on mridangam strokes using a self-created dataset containing all 7 fundamental strokes (70-75 samples each). Our system achieved 92% audio onset F1-score (how accurately is the beat onset detected). Vision precision for the seven fundamental strokes averaged 80%, successfully discriminating visually similar configurations.
Acknowledgments
This work has been supported by U.S. Army Research Laboratory (ARL W911NF-23-2-0224). Content reflect authors' views, not official ARL/U.S. Government policy. The U.S. Government retains reproduction rights.
References
- [1] Camillo Lugaresi et al. 2019. MediaPipe: A Framework for Perception Pipelines. In Proc. CVPR Workshops.
- [2] Brian McFee et al. 2015. librosa: Audio and Music Signal Analysis in Python. In Proc. SciPy, Vol. 8. 18-25.
- [3] Nikhila Ravi et al. 2024. SAM 2: Segment Anything in Images and Videos. arXiv:2408.00714 (2024).
Calculate Your Potential ROI
See how an AI-powered tutoring system can reclaim valuable time and reduce training costs within your organization.
Your AI Implementation Roadmap
A typical rollout of our multi-modal AI tutoring system from concept to operational impact.
Phase 1: Discovery & Customization
Initial consultation to understand your specific percussion training needs and dataset availability. Customization of models for instrument variants or unique teaching methodologies.
Phase 2: Data Integration & Model Training
Integration with existing sensor hardware (or recommendations for new setup) and deployment of initial multi-modal data capture. Iterative model training with your specific instructional content.
Phase 3: Pilot Deployment & Feedback
Deployment of the tutoring system in a controlled pilot environment with a select group of students/instructors. Gathering feedback for performance tuning and user experience enhancements.
Phase 4: Full-Scale Rollout & Ongoing Support
Full implementation across your educational or institutional setting. Continuous monitoring, updates, and dedicated support to ensure optimal system performance and user adoption.
Ready to Transform Percussion Education?
Leverage cutting-edge AI to preserve cultural arts and enhance learning. Schedule a personalized consultation to explore how our autonomous tutoring system can benefit your institution.