Did I write the following text or a bot?
As artificial intelligence begins to take over the internet, the above is one of the most important questions the tech industry will need to answer.
ChatGPT, GPT-4, Google Bard and other new AI services can create persuasive and useful written content. But we have seen that every technology is used for good and bad things. It can write software code faster and easier, but also reproduce errors and lies. Thus, developing a way to identify AI-generated text appears to be fundamental.
OpenAI, creator of ChatGPT and GPT-4, realized this a while ago. In January, it presented a “classifier for distinguishing between human-written text and text written by many different AIs” or “classifier to distinguish between text written by a human and text written by AIs from a variety of providers".
Η εταιρεία προειδοποίησε ότι είναι αδύνατο να εντοπιστεί αξιόπιστα αν όλο το κείμενο έχει γραφτεί με AI. Ωστόσο, η OpenAI ανέφερε ότι οι καλοί ταξινομητές είναι σημαντικοί για την αντιμετώπιση πολλών προβληματικών καταστάσεων, που συμπεριλαμβάνουν ψευδείς ισχυρισμούς ότι ένα κείμενο που δημιουργήθηκε από τεχνητή νοημοσύνη γράφτηκε από άνθρωπο, αυτοματοποιημένες καμπάνιες παραπληροφόρησης και η χρήση εργαλείων τεχνητής νοημοσύνης για εξscamση στην εργασία.
Less than seven months later, the project was cancelled.
"As of July 20, 2023, the AI classifier is no longer available due to its low accuracy rate," OpenAI wrote in a recent publication. "We are currently researching more efficient provenance techniques for text."
If OpenAI can't detect AI-generated text, how can anyone else?
If we can't tell the difference between artificial intelligence and human text, the world of online information will become much more problematic. There are already websites that generate automated content using new AI models. Some of them make ad revenue, with lies like “Biden is dead. Acting President Kamala Harris to speak at 9 a.m. according to Bloomberg.
If tech companies now inadvertently use AI-generated data to train new models, some researchers worry that the new models will be much worse. They will feed on automated content and participate in AI “Model Collapse”.
Researchers have already studied what happens when text produced by a GPT-type AI model (such as GPT-4) forms the bulk of the training dataset for subsequent models.
"We find that using model-generated content in training causes irreversible defects in the resulting new models," they concluded in a their recent research paper. One of the researchers, Ilia Shumailov, put it best on Twitter.
After seeing what could go wrong, the researchers made an appeal and an interesting prediction.
"It must be seriously considered if we are to preserve the benefits of learning from large-scale data pulled from the Web," they wrote. "Indeed, the value of data collected from genuine human interactions with systems will be increasingly more valuable than AI-generated content in data crawled on the Internet."
This cannot be addressed if we are not able to recognize whether a text is written by an AI or a human.