2025-08-03

On the Generalization and Adaptation Ability of Machine-Generated Text Detectors in Academic Writing

Zusammenfassung

The rising popularity of large language models (LLMs) has raised concerns about potential abuse and harmful content. As a result, developing a highly generalizable and adaptable machine-generated text (MGT) detection system has become an urgent priority. Given that LLMs are most commonly misused in academic writing, this work investigates the generalization and adaptation capabilities of MGT detectors in three key aspects specific to academic writing: First, we construct MGT-Academic, a large-scale dataset comprising over 336M tokens and 749K samples. MGT-Academic focuses on academic writing, featuring human-written texts (HWTs) and MGTs across STEM, Humanities, and Social Sciences, paired with an extensible code framework for efficient benchmarking. Second, we benchmark the performance of various detectors for binary classification and text attribution tasks in both in-domain and cross-domain settings. This benchmark reveals the often-overlooked challenges of text attribution tasks. Third, we introduce a novel text attribution task in which models must adapt to new classes over time, with little or no access to prior training data, spanning both few-shot and many-shot scenarios. We implement a range of adaptation techniques to enhance performance across these settings. Our findings provide new insights into the generalization ability of MGT detectors and lay the foundation for building robust, adaptive detection systems. The code framework is available at https://github.com/Y-L-LIU/MGTBench-2.0.