Hacking libraries and AI training

Was AI training based on shadowy pirate libraries?


The Times in an article they mention them that so-called shadow libraries that are places where millions of book titles are illegally stored, in many cases without permission, are used as training data for AI models.

There are several file sharing sites on the internet that host an incredible amount of books, magazines and general printed material that you would normally have to pay to get.

Free libraries like the LibraryGenesis, the Z-Library or the Library, they offer material that you don't have enough time to read. But at the same time you can also upload your own material.

library genesis

Of course, don't expect your provider's DNS to see these links. They are blocked and will you need to change them with DNS Gloudflare or Google.

So this is a huge resource and the developers of artificial intelligence did not leave it unexploited. AI companies have acknowledged that they relied on shadow libraries in research work.

The OpenAI's GPT-1 educated at BookCorpus, which has over 7.000 unpublished titles pulled from the self-publishing platform smashwords.

For the training of GPT-3 , OpenAI said that about 16 percent of the data it used came from two “Internet-based groups of books” it called “Books1” and “Books2”.


According sued by Sarah Silverman (Sarah Silverman) and two other anti-OpenAI authors, Books2 is likely a "blatantly illegal" shadow library.

Efforts to shut down these sites have failed. Last year, the FBI, with the help of the Editors' Guild, charged two people who are accused of managing Z-Library for copyright infringement, fraud and money laundering.

But at έχεια, ορισμένοι από αυτούς τους ιστότοπους μεταφέρθηκαν στον Dark Web και τους ιστότοπους torrent, καθιστώντας δυσκολότερο τον εντοπισμό τους. Και επειδή πολλοί από αυτούς τους ιστότοπους λειτουργούν εκτός των Ηνωμένων Πολιτειών και ανώνυμα, η τιμωρία των χειριστών είναι πραγματικά δύσκολη υπόθεση.

However, after all this fuss, tech companies are becoming increasingly strict about the data used to train their systems.

iGuRu.gr The Best Technology Site in Greecefgns

Subscribe to via Email

Subscribe to this blog and receive notifications of new posts by email.


Written by Dimitris

Dimitris hates on Mondays .....

Leave a reply

Your email address is not published. Required fields are mentioned with *

Your message will not be published if:
1. Contains insulting, defamatory, racist, offensive or inappropriate comments.
2. Causes harm to minors.
3. It interferes with the privacy and individual and social rights of other users.
4. Advertises products or services or websites.
5. Contains personal information (address, phone, etc.).