keyboard_arrow_up
Boosting Fake News Detection in Arabic Dialects with Consistency-Aware LLM Merging Techniques

Authors

Abdelouahab Hocini and Kamel Smaili, University of Lorraine, France

Abstract

This work explores the use of Large Language Models (LLMs) for fake news detection in multilingual and multi-script contexts, focusing on Arabic dialects. We address the challenge of insufficient digital data for many Arabic dialects by using pretrained LLMs on a diverse corpus including Modern Standard Arabic (MSA), followed by fine-tuning on dialect-specific data. We examine AraBERT, DarijaBERT, and mBERT for performance on North African Arabic dialects, incorporating code-switching and writing styles such as Arabizi. We evaluate these models on the BOUTEF dataset, which includes fake news, fake comments, and denial categories. Our approach fine-tunes both Arabic and Latin script text, with a focus on cross-script generalization. We improve accuracy using an ensemble strategy that merges predictions from AraBERT and DarijaBERT. Additionally, we introduce a new custom loss function, named CALLM to enforce consistency between models, boosting classification performance. The use of CALLM achieves significant improvement in F1-score (12.88 ↑) and accuracy (2.47 ↑) compared to the best model (MarBERT).

Keywords

Tacit knowledge,Implicit knowledge formalization, Cognitive maps, Knowledge management, NLP (Natural Language Processing), Topic modeling, Latent variables, Corporate knowledge systems

Full Text  Volume 15, Number 18