TikTok owner ByteDance scrapes the web faster than OpenAI
In This Story
As ByteDance develops artificial intelligence models to compete in China, the bot it uses to scrape data to train those models is reportedly spiking in activity.
The TikTok owner launched its own web scraper, Bytespider, in April, and it’s now scraping data multiple times faster than bots from other companies, Fortune reported, citing research from Kasada, a bot management company, and Dark Visitors, a monitor of scraper bots. Companies developing AI models, such as Google (GOOGL) and Meta (META), use scraper bots to gather data to train and improve the large language models (LLMs) and multimodal models that power the companies’ AI services.
Advertisement
Bytespider is scraping web data about 25 times faster than OpenAI’s web scraper, GPTbot, Sam Crowther, CEO of Kasada, told Fortune. Compared with Anthropic’s ClaudeBot, Bytespider is 3,000 faster.
Advertisement
Like OpenAI’s and Anthropic’s bots, Bytespider ignores instructions from robots.txt, a non-legally binding line of code that tells web scrapers which data it can and cannot access on a website, Fortune reported. According to Kasada’s data, Bytespider has had spikes in scraping activity in the last six weeks.
Advertisement
“It’s like they’re trying desperately to catch up,” Crowther told Fortune.
ByteDance did not immediately respond to a request for comment.
The China-based company released its AI-powered chatbot, Doubao, last August, and it’s proving to be a tough competitor to homegrown rival Baidu’s (BIDU) Ernie Bot. In May, ByteDance launched a series of Doubao LLMs for enterprises, which cost less than models from the company’s Chinese competitors.
Advertisement
Now, ByteDance is planning to build a new AI model using chips from China’s Huawei, Reuters reported, citing three unnamed people familiar with the matter. However, a spokesperson for ByteDance previously told Quartz the company is not developing a new AI model.
The company has also designed two AI chips with Taiwan Semiconductor Manufacturing Company (TSM) that ByteDance plans to mass produce by 2026, The Information reported, citing unnamed people familiar with the matter. By producing its own chips, the company could become less dependent on Nvidia’s (NVDA) pricey graphics processing units, or GPUs, which are subject to U.S. export controls, people told The Information.