OpenAI has published an official response in the lawsuit filed by the New York Times claiming that the company used its articles without permission to train the Large Language Model (LLM).
In a letter published by OpenAI, the company denies the New York Times' claims, noting that the company's posts contain prompts to retrieve data from the bot. Regression is a process where AI models provide training data verbatim when requested in a certain way.
“Interestingly, the pushbacks the New York Times has caused appear to be from old articles that have been propagated to many third-party sites. It appears that they deliberately manipulated the prompts, which often include long excerpts from articles, in order to get our model to copy. Even when using such prompts, our models typically do not behave in the way the New York Times suggests, suggesting that they either instructed the model to copy or chose their examples after many attempts."
The company says they had no knowledge of the lawsuit and learned about it when they read about it in the New York Times.
“We had explained to the New York Times that like any single source, their content did not materially contribute to the training of our existing models and also would not have sufficient impact for future training. Their lawsuit on Dec. 27 — which we learned about by reading the New York Times — surprised and disappointed us.”
OpenAI also reports that the Times had found cases of regression when the two parties were working together, but did not provide examples when asked about them. The company noted that it treats regression claims with a very high priority and provided an example of Bing Integration removal to support their claim.
“Along the way, they had reported seeing some regression of their content, but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We've demonstrated how seriously we treat such issues as a priority, such as in July when we patched a feature of chatgpt as soon as we learned it could play content in real-time in unintended ways.
The company's letter also focuses on other points, such as the licensing agreement between news agencies such as the Associated Press, Axel Springer, the American Journalism Project and NYU.
OpenAI also spoke about fair use saying that if the content is available on the Internet, it falls under the fair use regulation and can be used to train AI models.