LLM AI is trained with AI-generated content ….

A lot interesting publication από το VentureBeat, προβλέπει το δυσοίωνο μέλλον των μεγάλων γλωσσικών μοντέλων (LLM) της ς νοημοσύνης:

As those who follow the growing industry and its underlying research know, the data used to of large language models (LLM) and other models that support products like ChatGPT, Stable Diffusion, and Midjourney are originally derived from human sources – books, articles, photos, and so on – created without the help of artificial intelligence.

language models ai

Now, as more and more people use AI to produce and publish content, an obvious question arises:

What will happen as AI-generated content proliferates online and AI models begin to be trained by it, rather than human-generated content?

A team of researchers from the UK and Canada looked at this very problem and recently published a paper in arXiv open access journal.

Αυτό που βρήκαν είναι ανησυχητικό για την τρέχουσα AI και το μέλλον της:

"We find that using model-generated content in training causes irreversible defects in the resulting models." Specifically looking at the probability distributions for the text-to-text and image-to-image AI generation models, the researchers concluded that “learning from data generated by other models causes model collapse – a degenerative process in which, over time over time, models forget the truth. This process is inevitable, even for cases with near-ideal conditions for long-term learning."

Ilia Shumailov, in an email to VentureBeat said, "We were surprised to notice how quickly model collapse can occur: Models can quickly forget most of the initial data they learned from in the first place."

In other words: as an AI training model is exposed to more AI-generated data, it performs worse over time, producing more errors in the answers and content it generates.

As another of its authors wrote s, Ross Anderson, professor of safety engineering at the University of Cambridge and the University of Edinburgh, in a blog post discussing the work:

“Όπως έχουμε γεμίσει τους ωκεανούς με πλαστικά σκουπίδια και την ατμόσφαιρα με διοξείδιο του άνθρακα, θα γεμίσουμε το Διαδίκτυο με μπλα μπλα. Αυτό θα καταστήσει δυσκολότερο την εκπαίδευση νεότερων μοντέλων από τα δεδομένα που δημιουργήθηκαν από τον άνθρωπο, δίνοντας το πλεονέκτημα σε που το έκαναν ήδη ή που ελέγχουν την πρόσβαση σε ανθρώπινα δεδομένα”.

iGuRu.gr The Best Technology Site in Greecefgns

Subscribe to Blog by Email

Subscribe to this blog and receive notifications of new posts by email.

LLM

Written by giorgos

George still wonders what he's doing here ...

Leave a reply

Your email address is not published. Required fields are mentioned with *

Your message will not be published if:
1. Contains insulting, defamatory, racist, offensive or inappropriate comments.
2. Causes harm to minors.
3. It interferes with the privacy and individual and social rights of other users.
4. Advertises products or services or websites.
5. Contains personal information (address, phone, etc.).