The dataset was systematically developed to capture memory-level behavioral dynamics of malware and benign processes through interval-based snapshot analysis. Unlike prior datasets that predominantly rely on static binaries or network-level observations, this dataset focuses on runtime memory behavior and process persistence, enabling a deeper understanding of how malicious activities evolve over time. It integrates diverse malware families and benign software, ensuring realistic and unbiased modeling of system-level threats in dynamic execution environments.
Captured and labeled 2 Data sources: Memory snapshot data and process-level behavioral logs
Testbed: Controlled execution environment with interval-based memory dumping across multiple time windows
Attack Profile: Eight malware categories, including Backdoor, Hoax, HackTool, Trojan, Worm, Virus, Rootkit, and Exploit, alongside benign software samples
Data size: 40 TB memory snapshots and associated behavioral records across multiple execution intervals
Data records: 2000 malware samples and 250 benign samples with varying persistence patterns across snapshots
Data capturing: Interval-based memory snapshot collection capturing transient and persistent process behaviors
Extracted Features: Memory and process-level features capturing temporal persistence, behavioral transitions, and execution patterns
This dataset introduces a temporal memory-based analysis framework, where malware and benign processes are observed across multiple time intervals to capture both transient and persistent behaviors. A novel representation of process persistence patterns (single, multiple, and timeout-based appearances across snapshots) enables fine-grained modeling of execution dynamics. By combining memory snapshots with behavioral logs, the dataset supports multi-perspective analysis of system-level activities, going beyond traditional static or network-based approaches. This design enables the development of advanced AI and LLM-based detection systems that leverage temporal evolution, contextual behavior, and cross-snapshot correlations to identify sophisticated malware that evades conventional detection mechanisms.
The full research paper outlining the details of the dataset and its underlying principles:
"", Yasin Dehfouli and Arash Habibi Lashkari, Journal of Information Security and Applications, Volume 94, November 2025,
Download Dataset:
