- Automattic is close to finalizing an agreement to provide data for training the AI companies’ models.
- AI data training deals have become a lucrative opportunity for websites treading water in today’s slippery online publishing landscape.
Tumblr and WordPress.com are on the verge of striking deals to sell user data to artificial intelligence companies OpenAI and Midjourney. According to reports from 404 Media, Automattic, the parent company of both platforms, is close to finalizing an agreement to provide data for training the AI companies’ models.
Murky nature of data in the deal
The exact nature of the data to be included in the deal remains murky, but an alleged internal communication from Tumblr product manager Cyle Gage has raised concerns about the extent of data being prepared for transfer. The report indicates that the data may have encompassed private or partner-related information that was not intended to be part of the deal. This allegedly included private posts on public blog posts, deleted or suspended blogs, unanswered questions, private answers, explicit content, and material from premium partner blogs.
Also read: OpenAI cures GPT-4 ‘laziness’ with new updates
Automattic’s response and commitment
When approached for comment, Automattic responded with a published statement, emphasizing that only public content hosted on WordPress.com and Tumblr from sites that haven’t opted out will be shared. The company also highlighted its commitment to respecting all opt-out settings and announced plans to launch a new opt-out tool aimed at allowing users to block third parties, including AI companies, from training on their data.
Advocating for user data removal
An alleged internal FAQ prepared by Automattic for the new opt-out tool suggests that the company will actively advocate for the removal of data upon user request. While the language used in the document to describe this process as “asking” and “advocating” may raise eyebrows, Andrew Spittle, Automattic’s AI head, has expressed confidence that the AI companies will honor these requests based on previous conversations.
Also read: OpenAI CEO Sam Altman and U.S. House Speaker navigate AI regulation challenges on capitol hill
Challenges faced by Tumblr and WordPress.com
The backdrop to these potential deals is the rapidly evolving landscape of online publishing, where websites are seeking new revenue streams to stay afloat. Tumblr, in particular, has faced its share of challenges, reportedly downsizing its staff to a bare minimum in late 2023. This context underscores the significance of such data deals for platforms like Tumblr and WordPress.com.
In a broader industry context, this news adds to the growing trend of AI companies seeking to leverage user-generated content for training their models. Recently, Google struck a deal with Reddit, and OpenAI has been actively pursuing partnerships to collect datasets for AI model training.






