OpenAI does not recognize if a text was written by AI. What does this mean;

Did I write the following text or a bot?

As artificial intelligence begins to take over the internet, the above is one of the most important questions the tech industry will need to answer.

ChatGPT, GPT-4, Google Bard and other new AI services can create persuasive and useful written content. But we have seen that every technology is used for good and bad things. It can write software code faster and easier, but also reproduce Mistakes and lies. Thus, developing a way to identify AI-generated text appears to be fundamental.

OpenAI, creator of ChatGPT and GPT-4, realized this a while ago. In January, it presented a “classifier for distinguishing between human-written text and text written by many different AIs” or “classifier to distinguish between text written by a human and text written by AIs from a variety of providers".

The company warned that it is impossible to reliably detect whether all text has been written by AI. However, OpenAI said that good classifiers are important to address many problematic situations, including false claims that an AI-generated text was written by a human, automated disinformation campaigns, and the use of AI tools to cheat at work.

Less than seven months later, the project was cancelled.

"As of July 20, 2023, the AI classifier is no longer available due to its low accuracy rate," OpenAI wrote in a recent publication. "We are currently researching more efficient provenance techniques for text."

The consequences

If OpenAI can't detect AI-generated text, how can anyone else?

If we can't tell the difference between artificial intelligence and human text, the world of online information will become much more problematic. There are already websites that generate automated content using new AI models. Some of them have income from ads, with lies like “Biden is dead. Acting President Kamala Harris to speak at 9 a.m. according to Bloomberg.

If tech companies are now inadvertently using AI-generated data to train new models, some researchers they worry that the new models will be much worse. They will feed on automated content and participate in AI “Model Collapse”.

Researchers have already studied what happens when the text produced by a GPT-type AI model (such as GPT-4) is the majority of the set data training for future models.

"We find that using model-generated content in training causes irreversible defects in the resulting new models," they concluded in a their recent research paper. One of the researchers, Ilia Shumailov, put it best on Twitter.

What happens when generated data of one LLM becomes training data of another LLM?
Turns out that models start forgetting the real distribution and as the process repeats models develop dementia.https://t.co/UdEO9vPThD pic.twitter.com/jge9m4IlU9
— Ilia Shumailov🦔 (@iliaishhacked) May 27, 2023

After seeing what could go wrong, the researchers made an appeal and an interesting prediction.

"It must be seriously considered if we are to preserve the benefits of learning from large-scale data pulled from the Web," they wrote. "Indeed, the value of data collected from genuine human interactions with systems will be increasingly more valuable than AI-generated content in data crawled on the Internet."

This cannot be addressed if we are not able to recognize whether a text is written by an AI or a human.

OpenAI does not recognize if a text was written by AI. What does this mean;

The consequences

every publication, directly to your inbox

Written by giorgos

Leave a reply Ακύρωση απάντησης

The consequences

every publication, directly to your inbox

spread the news

Leave a reply Ακύρωση απάντησης