Game UI database slowdown caused by OpenAI's incessant scraping

Not long after it was relaunched with over 55,000 screenshots and a host of new featuresThe game's UI database has hit a major snag.

Edd Coates, creator of the free repository, noticed that the website was “slow as hell,” and with the help of server master Jay Peet, began investigating. The solution was alarmingly simple. Blocking a single IP address allowed normal service to resume, but who owned that digital phone card? None other than OpenAI, the generative AI company behind ChatGPT and Dall-E.

Coates initially shared the news on Xand criticized OpenAI's practice of scraping websites for information that can be used to train its models. The company isn't shy about it. Its the website openly states Its large language models are trained on three sources, including “information that is publicly available on the Internet.” Naturally, a myriad of lawsuits and potential legal challenges involving the main newspapers (including the New York Times) AND even YouTube creators I would argue that “publicly available” does not equate to legal (thanks The Guardian and The Verge).

For Coates, the problem here is twofold. For one, he disagrees with OpenAI’s methods or business model, but being targeted by the company has also brought down a free resource he spent five years building.

“I first noticed the database issues a couple of weeks ago when pages were taking a lot longer to load. I knew it had nothing to do with the site itself because it had always worked without any issues (even with more active users online), so I suspected foul play but couldn't find any evidence at the time,” Coates told Game Developer.

“I was unable to post any updates to the site because the delay was interfering with my admin tools, and I was even receiving angry emails and messages from users who rely on the site as part of their daily workflow.”

He explained that the outage eventually caused the website to go down completely, giving users “502 Bad Gateway” errors. At that point, Coates sought help from Jay Peet, who had hosted the database on their private server for the past five years. Peet examined the site’s logs and realized that the website’s resources were being swallowed up by a single IP address belonging to OpenAI.

“The homepage was reloading 200 times per second, like the [OpenAI] bot was apparently struggling to find its way around the site and was stuck in a constant loop,” Coates added. “This was essentially a two-week long DDoS attack in the form of a data theft.”

Game UI Database founder questions whether OpenAI's scraping is “fair or even legal”?

Coates makes no money from the Game UI Database. In fact, they run the website at a loss. However, if they had attempted to monetize the project or leveraged external tools like Amazon Web Services, OpenAI’s unwanted interest could have caused financial harm.

“If I trusted myself to [Game UI Database] “If I had to pay for ad revenue or membership fees, the downtime OpenAI caused would definitely have impacted my income,” he said. “They were transferring about 70GB of data off the server every ten minutes. Fortunately, I have no bandwidth costs and minimal server fees, so I can provide this resource to everyone for free (as all educational resources should be). But if I had paid AWS for storage, for example, this bandwidth would have cost me about £850 per day.”

“OpenAI isn't even transparent about where its data comes from, so I would have been solely responsible for that bill. How is that fair or even legal? And I'm certainly not the only one affected by this.”

Coates said the problem goes deeper than a potential loss of revenue, though. “Don’t even get me started on what they’re doing with this data,” he continued, noting that he spent years meticulously collecting and cataloging UI references to help other creatives in the game industry, only to see that work (which includes the efforts of thousands of developers) “stolen by a multibillion-dollar organization.”

Coates said the idea that OpenAI is repurposing that work to “hurt and replace the people I’m trying to help” only adds insult to injury. “It’s sick. Generative AI technology simply wouldn’t exist without the work of human creatives, and yet we’re the ones being punished here without compensation or recognition,” they added.

As for how Game UI Database managed to fend off OpenAI's advances, Coates and Peet ultimately blocked all of the company's associated IPs at the HTTP server level to prevent it from implementing a workaround.

“When I investigated this issue, it was clear that we had been scanned or crawled by something. Our usage analytics showed 10 active users, while the actual server was reporting 200 to 300 active requests,” Peet said. “I suspected that someone was trying to DDoS us or that data was being taken en masse by an automated scraper.

“Blocking crawlers wasn't too difficult, although the method of doing so could be fragile if OpenAI were to add or change any of its in-use IP addresses. After editing robots.txt (which OpenAI can simply ignore if they want), I explicitly blocked all OpenAI IPs at the HTTP server level. After switching to the new configuration, the website immediately returned to its normal response time.”

At the time of writing, Game UI Database is back firing on all cylinders and championing the UI work found in nearly 1,400 titles, including ill-fated projects like ConcordThe game developer has reached out to OpenAI for comment.

Leave a Comment

url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url