Stylometric Robustness with DeBERTa: Identifying Authorial Shifts in Adversarial and Collaborative Texts
Riya SanjeshResearch Sholar, School of Computer Science and Engineering, Presidency University, Bangalore, Karnataka, India. riya.sanjesh@presidencyuniversity.in0009-0008-6617-689X
Pamela Vinitha EricProfessor, School of Computer Science and Engineering, Presidency University, Bangalore, Karnataka, India. pamelavinitha.eric@presidencyuniversity.in0000-0002-4840-6179
The presence of collaborative and adversarial writing is a significant problem in authorship analysis due to the difficulty of using traditional stylometric methods on small text fragments, intentional stylistic confusion, and redundant topical material. Lexical-statistical, syntactic, and superficial-based methods are especially susceptible to paraphrasing and topic-style confounding, which restricts their use in forensic, educational, and cybersecurity contexts. The paper suggests a DeBERTa-based paragraph-level style-change detector in the form of a binary Natural Language Inference (NLI) problem. The model assesses pairs of consecutive paragraphs to identify the presence of an authorial transition, using the disentangled attention mechanism of DeBERTa to distinguish structural stylistic features of the text and semantic information. The strength is further improved with the aggressive data-augmentation methods such as back-translation, synonym replacement, and sentence shuffling, which mimic adversarial rewriting. Experiments on the PAN 2023 Multi-Author Writing Style Analysis data show that the suggested DeBERTa-v3 model is more accurate, has a higher F1-score, and ROC-AUC in challenging multi-author and topic-stable conditions than classical stylometric classifiers and strong transformer baselines like RoBERTa. Accuracy and F1-score Results in terms of accuracy and F1-score, DeBERTa-v3 obtained 81.3% accuracy and 80.6 F1-score, which is much higher than SVM (69.5%, 66.6) and Logistic Regression (68.0%, 65.4) and is an improvement over RoBERTa-base (75.2%, 74.6). These findings indicate the usefulness of disentangled attention in capturing fine-grained stylistic variation and underscore the practical usefulness of the framework to forensic linguistics, collaborative writing systems, academic integrity detection, and other lightweight applications in actual document monitoring systems.