Cyber Threat Intelligence

Inferring Causality Between Entities and Events from CTI Reports

This study introduces a novel approach to inferring causal relationships between entities and events in Cyber Threat Intelligence (CTI) reports. By addressing the gap in existing research that primarily focuses on event-to-event causality, our proposed BERT-based Multi-Layer Stacked Architecture (MLSA) model provides a robust and efficient framework for analyzing event-entity causal links. Experimental results demonstrate significant performance improvements over state-of-the-art LLMs like GPT-40, particularly in F1-score and effect relation inference. This capability is crucial for accelerating root cause analysis in cybersecurity incidents, enabling rapid and effective responses, and improving human-AI collaboration in Security Operations Centres (SOCs).

Schedule Your AI Strategy Session

Executive Impact & Key Metrics

Leveraging advanced AI for CTI dramatically enhances response capabilities. Our model’s precision and efficiency translate directly into tangible operational advantages for your organization.

0 F1-score Improvement (MLSA vs GPT-40)

0 Effect Relation F1-score Improvement (MLSA vs GPT 03-mini)

0 MLSA Trainable Parameters

0 MLSA Micro avg F1-score (Event Extraction)

0 MLSA Training Throughput

0 MLSA Peak Training GPU Memory

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Advanced Information Extraction for CTI

Our research significantly advances the state-of-the-art in information extraction within Cyber Threat Intelligence (CTI) by focusing on fine-grained causal relationships between entities and events. Traditional methods often overlook these crucial links, hindering rapid root cause analysis. The proposed MLSA architecture, built upon BERT, efficiently identifies event triggers, arguments, and their semantic roles, paving the way for more actionable intelligence. This capability is paramount for Security Operations Centres (SOCs) facing high volumes of threat reports, allowing AI systems to pre-process and structure critical causal data for human analysts.

Key strengths include robust performance on event and event argument extraction, often outperforming baseline models like CASIE. For example, our method shows notable improvements in "Databreach", "Phishing", and "DiscoverVulnerability" event categories. This precision in identifying cybersecurity elements directly supports more effective causal inference, a critical component for proactive defense strategies.

Precision Causal Reasoning in Cybersecurity

Causal reasoning is at the heart of understanding cybersecurity incidents. Our model excels at distinguishing between 'Causal' entities (those that trigger events) and 'Effect' entities (those that are affected by events). This is a significant leap beyond traditional event-to-event causality, providing a more granular and precise understanding of threat dynamics. The MLSA framework integrates contextual and role-based information to enhance causal inference, classifying relationships as Causal, Effect, or No Relation with high accuracy.

While large language models (LLMs) like GPT-40 demonstrated general causal reasoning, they often struggled with the precision required for entity-specific causal links in complex CTI contexts. Inference-optimized LLMs (01, 03-mini) showed improved alignment but still presented limitations compared to our specialized MLSA. This highlights the value of domain-specific, structured approaches for high-stakes applications like cybersecurity, where misclassifications can have severe consequences.

Leveraging Neural Networks for Enhanced CTI

The core of our causal inference model is a BERT-based Multi-Layer Stacked Architecture (MLSA). This neural network design leverages BERT's ability to generate contextualized token embeddings, capturing deep semantic relationships within text. Further enhanced by a multi-head attention layer, the MLSA refines token-level features and models variable-length phrases more effectively, which is critical for accurate span representation.

This architecture is designed for joint learning of event extraction, named entity recognition, and causal relationship inference, improving overall accuracy by facilitating the discovery of latent correlations among subtasks. The use of specialized features such as span-level POS tagging, span length, and CLS token representations further optimizes the model for identifying event triggers and arguments. Its compact size and efficient resource utilization make it highly suitable for integration into existing Security Operations Centre (SOC) workflows, offering a robust and scalable solution for CTI analysis.

0 Average F1-score improvement of MLSA over GPT-40 on Causal Inference

Enterprise Process Flow

BERT Layer (Contextual Embeddings)

→

Multi-Head Attention (Feature Refinement)

→

Span-level Classification (Event/Argument)

→

Causal Relationship Inference (Causal/Effect/No Relation)

MLSA vs. GPT 03-mini Causal Inference Performance

Feature/Aspect	MLSA (BERT-based)	GPT 03-mini (LLM-based)
Performance (Causal F1-score)	80.0% (Robust on structured CTI data, Table 13)	81.2% (Strong, but sometimes requires more tuning, Table 13)
Performance (Effect F1-score)	84.7% (High accuracy in identifying affected entities, Table 13)	75.8% (Generally good, but MLSA shows 11% improvement, Table 13 and Abstract)
Computational Efficiency	✓ More efficient for domain-specific tasks ✓ Smaller model size (~110M params, Table 5) ✓ Faster training (6.59 min for 800 reports, Table 11)	✓ Requires substantial computational resources ✓ Larger parameter counts ✓ Higher API costs ($8.29 for 200 reports, Table 12)
Generalization Capability	✓ Strong performance on structured threat data ✓ Domain-specific optimization	✓ Excels at open-domain reasoning ✓ Flexible adaptation to various contexts ✓ Requires iterative prompt engineering
Training Data Requirement	✓ Requires high-quality annotated dataset ✓ Leverages BERT's pre-training	✓ Leverages few-shot learning with demonstrations ✓ Can perform with limited training data

0 Total Training Time for 800 CTI News Reports (MLSA)

Challenges in Causal Inference: Error Pattern Analysis

Despite MLSA's strong performance, critical error patterns were identified that highlight ongoing challenges in causal inference from CTI reports, providing valuable insights for future improvements:

Case 1: Entity Segmentation Error

The model sometimes incorrectly segmented entities, such as splitting "patient records" into ["patient"] and ["records"]. While both fragments were correctly labeled for their causal role (Effect), this segmentation error can hinder the precision of downstream causal inference by misrepresenting the complete entity.

Case 2: Role Misclassification

Entities were occasionally misclassified in terms of their semantic roles. For instance, "Google" might be incorrectly labeled as a "Vulnerable_System" instead of a "Trusted-Entity". This directly leads to causal inference failures, sometimes even reversing the inferred causal direction, emphasizing the importance of accurate role recognition in complex CTI contexts.

Case 3: Unseen Malware Name (OOV) and NER Failure

When encountering previously unseen malware names (e.g., "GoldenEye" or "WannaCry"), BERT's out-of-vocabulary tokenization can lead to missed entity tags. Without proper entity recognition, no causal relationship can be inferred for these terms, illustrating a core limitation of BERT-based models in domains with rapidly evolving terminology.

Case 4: Failure to Extract Long Entities

The model struggles with excessively long entity spans. These entities, often containing multiple modifiers or embedded sub-entities, cause semantic dispersion and reduce extraction accuracy. Overly long spans can exceed the model's effective attention range, leading to partial recognition or unintended segmentation, which ultimately impairs causal inference.

These cases underscore the need for continuous refinement in entity boundary detection, role assignment, and robust handling of dynamic vocabularies to improve the reliability and completeness of causal inference in real-world CTI scenarios.

Calculate Your Potential AI ROI

See how leveraging AI for causal inference in CTI can translate into significant operational savings and reclaimed analyst hours for your enterprise.

Your Industry

Number of Analysts

Average Weekly Hours on CTI Analysis

Average Hourly Cost Per Analyst ($)

Annual Savings Potential $-

Annual Hours Reclaimed 0 Hours

Your AI Implementation Roadmap

A phased approach ensures seamless integration and maximum impact. We guide you from foundational setup to advanced operational intelligence.

Phase 01: Discovery & Strategy

Goal: Define specific CTI causal inference needs and align with business objectives.
Activities: Initial consultation, current CTI workflow analysis, objective setting, and MLSA/GPT model selection based on enterprise requirements.

Phase 02: Data Preparation & Model Training

Goal: Prepare custom datasets and train the MLSA model for optimal performance.
Activities: Data annotation (leveraging our custom dataset and GPT-assisted methods), fine-tuning BERT-based MLSA for specific CTI reports, and prompt engineering for LLMs if a hybrid approach is chosen.

Phase 03: Integration & Validation

Goal: Integrate the causal inference solution into existing SOC/SIEM/SOAR platforms.
Activities: API integration, system testing, performance validation against real-time CTI feeds, and iterative refinements based on initial operational feedback.

Phase 04: Advanced Operational Intelligence

Goal: Extend capabilities and enhance human-AI collaboration for proactive defense.
Activities: Development of interactive dashboards for causal graph visualization, continuous model monitoring and retraining, and advanced analyst training for leveraging AI-powered insights.

Book Your Free Consultation

Ready to Revolutionize Your CTI?

Empower your Security Operations Centre with precise, AI-driven causal inference. Gain deeper insights, accelerate response times, and stay ahead of evolving cyber threats.

Get Started Today

Cyber Threat Intelligence

Inferring Causality Between Entities and Events from CTI Reports

Executive Impact & Key Metrics

Deep Analysis & Enterprise Applications

Advanced Information Extraction for CTI

Precision Causal Reasoning in Cybersecurity

Leveraging Neural Networks for Enhanced CTI

Enterprise Process Flow

MLSA vs. GPT 03-mini Causal Inference Performance

Challenges in Causal Inference: Error Pattern Analysis

Case 1: Entity Segmentation Error

Case 2: Role Misclassification

Case 3: Unseen Malware Name (OOV) and NER Failure

Case 4: Failure to Extract Long Entities

Calculate Your Potential AI ROI

Your AI Implementation Roadmap

Phase 01: Discovery & Strategy

Phase 02: Data Preparation & Model Training

Phase 03: Integration & Validation

Phase 04: Advanced Operational Intelligence

Ready to Revolutionize Your CTI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai