
With the inevitable growth of information digitization, Portable Document Format (PDF) has become one of the most popular exploited file formats for document exchange among various applications and platforms. Consequently, PDF files have become an attractive target for attackers to infect and deliver malicious codes to users. Despite the efficacy and success of machine learning classifiers in detecting malicious PDF files, they require tedious feature engineering and have some limitations. Additionally, one of the main reasons for resistance to using deep learning models is their lack of interpretability. To address these challenges, this study proposes using the TabNet model for malicious PDF detection, offering global and local interpretability while maintaining high or competitive detection performance. The Optuna optimization framework is employed to further enhance the model鈥檚 capabilities. The proposed approach is evaluated on the real-world Evasive-PDFMal2022 dataset and demonstrates state-of-the-art performance compared to baseline methods.
