DarkBERT the AI ​​model trained on the dark web

After his success of OpenAI, Microsoft's Bing Chat, and Google's Bard, researchers created a new AI model with much darker motivations.

spider net

While the large language models (LLMs) powering ChatGPT and Google Bard were trained on data from the open web, the trained exclusively on data from the dark web. Yes, you read that right, this AI model was trained using data from hackers, cybercriminals and other crooks.

A group of South Korean researchers released a paper (PDF) detailing how they built DarkBERT using data from the Tor network, which is used to access the dark web. By scouring the dark web and then filtering the raw data, they were able to create a database that they used to train DarkBERT.

Surprisingly, DarkBERT has already managed to outperform other large models, despite being trained on data from a very unlikely place.

Although DarkBERT is a new model of artificial intelligence, it is actually based on the RoBERTa architecture, which is an artificial intelligence approach developed in 2019 by researchers at Facebook according to Tom's Hardware.

In a research paper detailing the inner workings of RoBERTa, Meta AI explains that it is a “highly optimized method for pre-training natural language processing (NLP) systems” that improves on BERT released by Google in 2018. As Google open-sourced BERT, Facebook researchers were able to improve its performance.

Thanks to Facebook's optimized method, RoBERTa was launched, which was able to produce results of last on Language Understanding (GLUE) NLP benchmark.

But now the South Korean researchers behind DarkBERT have shown that RoBERTa is capable of doing even more, as it was undertrained when it was originally released. By feeding data from the dark web to RoBERTa over the course of nearly 16 days with two datasets (one raw and one pre-processed), the researchers were able to build DarkBERT.

It should be noted that these researchers have no plans to release DarkBERT to the public. However, they accept requests for academic purposes according to Dexerto. It should be noted that DarkBERT is likely very attractive to law enforcement as well as adversaries on the other side. Of course it will also give researchers an opportunity to better understand the dark web as a whole.

iGuRu.gr The Best Technology Site in Greecefgns

every publication, directly to your inbox

Join the 2.086 registrants.
DarkBERT, dark web

Written by giorgos

George still wonders what he's doing here ...

Leave a reply

Your email address is not published. Required fields are mentioned with *

Your message will not be published if:
1. Contains insulting, defamatory, racist, offensive or inappropriate comments.
2. Causes harm to minors.
3. It interferes with the privacy and individual and social rights of other users.
4. Advertises products or services or websites.
5. Contains personal information (address, phone, etc.).