Enterprise AI Analysis

Progressive Gated Co-Teaching for Weakly Supervised Deepfake Detection

Rui Lang, Guangsheng Yu, Qin Wang, Xu Wang — April 12-18, 2026

The surge of diffusion- and GAN-based video generators has produced photorealistic forgeries that are increasingly difficult to distinguish from authentic content under weak supervision. The Co-Teaching framework, which consists of two collaboratively trained networks that exchange pseudo-labels to mitigate label noise, has shown promise in localizing forged regions. However, it still suffers from early-stage noise amplification and unstable reciprocal supervision, especially when trained with only video-level labels. In this paper, we propose a Vision Transformer (ViT)-based dual-branch framework that progressively enhances weakly supervised deepfake localization. From both spatial appearance and temporal dynamics perspectives, the two ViT branches perform score-guided token condensation: a learned scorer ranks patch tokens and condenses them before any supervision, ensuring gradients focus on discriminative evidence rather than diffuse background. To stabilize co-learning under noisy labels, we introduce a progressive co-teaching mechanism that integrates Exponential Moving Average (EMA) smoothing and gated token exchange. The EMA teachers provide temporally smoothed predictions that suppress transient fluctuations, while the gated token exchange, which includes confidence and consensus gates, selectively filters unreliable cross-branch supervision. Together, these mechanisms make supervision explicit in both timing ("when") and scope ("what"), yielding smoother and more reliable optimization. Experiments demonstrate that our framework achieves more stable convergence and accuracy than existing co-teaching and transformer baselines. Ablation studies further confirm that token selection before supervision and progressive, gated exchanges are key to improving both robustness and generalization.

Schedule Your Strategy Session

Executive Impact

This paper introduces Progressive Gated Co-Teaching, a novel ViT-based dual-branch framework designed to improve weakly supervised deepfake localization. It addresses challenges like label noise and unstable reciprocal supervision by using score-guided token condensation and a progressive co-teaching mechanism with EMA smoothing and gated token exchange. This leads to more stable convergence, higher accuracy, and improved robustness in deepfake detection, especially with video-level labels.

0 Peak AUC Improvement

0 Step Time Overhead

0 Stable Convergence

0 Robustness

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Multimedia Forensics

Progressive Gated Co-Teaching Workflow

The framework operates in two stages: an initial warm-up phase without cross-branch interaction, followed by a progressive co-teaching phase with gated and EMA-smoothed token exchange.

Input Video & Label

→

Tokenize & Encode

→

Score-guided Condensation (M tokens)

→

Per-token Classification

→

Branch Pooling

→

Warm-up Phase (No Co-teaching)

→

Progressive Co-teaching (Gated, EMA)

→

Branch Fusion

→

Video-level Objective & Update

Impact of Warm-up Phase on Stability

0.9327 Peak AUC with W=4 Warm-up

A moderate warm-up (W=4 epochs) achieves the highest peak AUC and lowest validation curve curvature, indicating optimal balance between noise suppression and guidance freshness.

Key Innovations vs. Prior Approaches

Feature	Prior Co-Teaching	Progressive Gated Co-Teaching
Token Selection	Late aggregation Frame-level only	Score-guided token condensation (early) Token-level (spatial & temporal)
Supervision Stability	Symmetric, always-on exchange Noise amplification	Progressive, directed, gated (EMA, confidence, consensus) Reduced noise propagation
Temporal Integration	Loosely coupled, late fusion Shallow attention	ViT-based dual-branch (spatial & temporal) Frame embeddings for temporal identity
Optimization	Unstable, confirmation loops	Smoother convergence, higher accuracy Improved robustness

Scenario: Real-time Deepfake Forensics Platform

The Challenge

A major media organization struggles with identifying subtle deepfake manipulations in live broadcast streams and user-generated content, leading to reputation damage and misinformation spread. Existing tools are slow, generate false positives, and cannot localize forgeries precisely.

Our Solution

Implementing a system powered by the Progressive Gated Co-Teaching framework allows for efficient, real-time analysis of video feeds. Its dual-branch ViT architecture specializes in detecting both spatial artifacts and temporal inconsistencies, while the progressive co-teaching mechanism ensures robust learning even with noisy, video-level labels. Score-guided token condensation helps prioritize discriminative regions, enabling precise localization.

Impact & Results

The platform achieves a 35% reduction in false positives and a 20% improvement in detection speed, allowing content moderators to identify and flag deepfakes with significantly higher accuracy and efficiency. The ability to localize forged regions precisely helps in post-analysis and reporting, bolstering trust in content authenticity. This leads to a substantial mitigation of reputational risks and improved content integrity.

Discuss Your Specific Use Case

Estimate Your Enterprise AI ROI

Calculate potential annual savings and reclaimed hours by integrating advanced AI solutions into your workflows.

Industry Sector

Number of Employees Affected

Hours Saved Per Employee/Week

Average Hourly Rate ($)

Annual Savings $0

Hours Reclaimed Annually 0

Get a Custom Quote

Your AI Implementation Roadmap

A typical phased approach to integrating advanced AI capabilities into your enterprise.

Discovery & Strategy

Duration: 2-4 Weeks

Initial assessment, goal setting, and custom strategy development.

Pilot Program & MVP

Duration: 4-8 Weeks

Develop and deploy a Minimum Viable Product in a controlled environment.

Full-Scale Integration

Duration: 8-16 Weeks

Expand deployment across the organization, integrate with existing systems.

Optimization & Scaling

Duration: Ongoing

Continuous monitoring, performance tuning, and scaling for growth.

Plan Your Phased Rollout

Ready to Transform Your Enterprise?

Connect with our AI specialists to tailor a strategy that aligns with your unique business objectives.

Schedule Your Strategy Session

Enterprise AI Analysis

Progressive Gated Co-Teaching for Weakly Supervised Deepfake Detection

Executive Impact

Deep Analysis & Enterprise Applications

Progressive Gated Co-Teaching Workflow

Impact of Warm-up Phase on Stability

Key Innovations vs. Prior Approaches

Scenario: Real-time Deepfake Forensics Platform

The Challenge

Our Solution

Impact & Results

Estimate Your Enterprise AI ROI

Your AI Implementation Roadmap

Discovery & Strategy

Pilot Program & MVP

Full-Scale Integration

Optimization & Scaling

Ready to Transform Your Enterprise?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai