NVIDIA Used Recovered Video Game Footage to Train AI Products

Leaked documents obtained by 404 Average reveal that NVIDIA has allegedly copied videos from the Internet such as movie clips and games for its AI products. As a result, customers using such products and tools are at risk of inadvertent copyright infringement.

Like other AI tool makers, Nvidia needs training data for its text, video, and audio generators to “learn” how to create assets. Data scraping generally refers to the practice of feeding existing video, text, and audio into training models without getting permission from the people who created them.

The technique means Youtube and Netflix (and the companies that manage media content on those platforms) have copyrighted material that was taken without consent.

Regulators in the United States and the European Union are still considering whether data scraping practices violate copyright law. The 404 Media report highlights how tech companies are lax about copyright law when it comes to generative AI, and how other industries, like entertainment and gaming, could be affected by these practices.

Company employees have expressed concerns about this behavior in messages reviewed by the outlet. Despite these concerns, NVIDIA told 404 Media that its scraping guidelines are “fully compliant with the letter and spirit of copyright law. […] Fair use protects the ability to use a work for a transformative purpose, such as modeling.

Game developers and their parent companies own the copyright, and YouTube is a important platform for the industry. Having him take the job without having a say creates a massive breach of trust with a company that often uses games from big studios to sell its Services and products.

Nvidia's AI engineers wanted gameplay video to improve their training data

An employee who spoke to the outlet says he and others were asked to capture full-length video that could help train the tech company’s AI model, and that gameplay footage in particular was highly coveted by engineers. Capturing that footage for the dataset involved working with NVIDIA’s GeForceNow cloud service.

In a Slack conversation, senior research analyst Jim Fan highlighted the service's streaming capabilities for capturing and storing video. All that “high-quality gameplay video,” he said, is “very useful” data to mine.

“We will work closely with [GeForceNow] and related engineering teams to set up real-time game data acquisition, expand the pipeline, and process it for training,” he explained.

However, project managers also allegedly told employees who raised concerns that scraping was an “executive decision” not to worry about. The “open legal issue” (such as a violation of YouTube’s Terms of Service) would apparently be resolved in the future.

In the 404 story, quotes from internal documents and Slack channels of several AI researchers show NVIDIA is actively trying to avoid bad publicity. Its vice president of research Ming-Yu Liu stressed that there could be no “negative sentiment” if the company didn’t publish any research on its download data.

“What we are doing here will lead to no publication,” Liu wrote. He and other staffers also created their own YouTube data scrapers and API accounts to facilitate the process.

Until regulators define what is and isn’t copyright infringement in the AI ​​world, NVIDIA and other companies will likely operate in a legal gray area. As MIT’s Robert Mahari told 404, proving data scraping can be “really hard technically.”

“The best [company] policy in terms of incentives, is to not tell people what you trained on,” he said. “So until you tell anyone, it’s going to be really hard to prove.”

You can read 404 Media's full and comprehensive report on NVIDIA data scraping Here.

Leave a Comment

url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url