Researchers claim new AI tool solves problem of false positives in student writing.

January 24, 2024
A group of researchers, primarily from the University of Maryland, claim to have developed a highly accurate tool for detecting text generated by artificial intelligence (AI) applications. The tool, named “Binoculars,” outperforms existing AI detection tools like GPTZero and Ghostbuster. The researchers tested Binoculars on various datasets, including news writing, creative writing, and student essays, and found that it detected over 90% of the samples generated by AI with a false positive rate of 0.01%. This tool could address concerns about students using AI to cheat on academic work, as well as mitigate the problem of false positives in AI detection.

Generative AI tools like ChatGPT have gained popularity, which raises concerns about academic integrity as students may use AI to complete their assignments and pass them off as their own. However, existing AI detection tools have often produced false positives, leading to accusations of cheating against innocent students. In response, schools and universities have disabled AI detection tools. The Binoculars researchers claim to have achieved a lower false positive rate with their new tool, making it more accurate and reliable in identifying AI-generated text.

The researchers envision making Binoculars a usable product that can be licensed for various applications. They highlight the importance of scientific research on language model detection, emphasizing the strides taken in the past six months to create effective tools for different purposes. The researchers are affiliated with the University of Maryland, Carnegie Mellon University, New York University, and the Tübingen AI Center, and their research was funded by Capital One, the Amazon Research Awards program, and Open Philanthropy.

The researchers tested Binoculars on various open-source AI models to confirm its effectiveness. The tool successfully distinguished between human-generated text and ChatGPT-generated text, surpassing other commercial detection systems that had been fine-tuned specifically for detecting ChatGPT text. Binoculars operates in a zero-shot setting and uses two stages of “viewing text” to compare the perplexities of an observer LLM and a performer LLM. By measuring the surprise or perplexity of the LLMs, Binoculars can accurately determine whether the text is generated by a machine or a human.

The researchers highlight the potential of Binoculars in maintaining platform integrity, especially in social media moderation. They acknowledge the conflicting perspectives on using language model detectors in schools but stress the importance of these tools in preventing social engineering campaigns, election manipulation, and spam on major websites. Binoculars demonstrates high accuracy in various domains, including Reddit, WikiHow, Wikipedia, and arXiv. It also proved effective in detecting AI-generated text in academic essays written by non-native English speakers.

