• AI companies have been critiqued for their rampant use of “publicly available” content to train their models, since much of what is publicly available online is still beholden to copyright.
  • OpenAI has already scraped and used any and all content once publicly available on Tumblr.
  • Critics have raised concerns about the potential infringement of copyright and the need for explicit user consent in such transactions.

The recent revelation that OpenAI and image generator Midjourney are on the brink of finalizing a deal to utilize public Tumblr data for AI model training has sparked both interest and concern within the industry.

Challenges around user privacy and content ownership

According to internal documents reviewed by 404 Media, the parent company of Tumblr, Automattic, is reportedly in discussions to sell public Tumblr content to these AI giants. While the specifics of the data to be sold remain undisclosed, questions arise regarding the potential impact on user privacy and content ownership.

The ethical implications of utilizing public user-generated content for AI training have been a topic of debate. Critics have raised concerns about the potential infringement of copyright and the need for explicit user consent in such transactions. This development serves as a reminder of the complex relationship between user-generated content platforms and the use of such data by third-party entities.

Also read: Tumblr kills Post Plus, and with it all paywalled content, starting January 2024

Empower and serve to users

In response to inquiries about the potential impact of the deal on Tumblr content, Automattic remained tight-lipped, leaving users and industry observers seeking clarity on the matter. The lack of transparency regarding the nature and scope of the data sale has only heightened apprehensions among users about the privacy and security of their content.

Amidst these developments, it is crucial for users to be aware of their rights and options. Automattic has emphasized the importance of user choice and has provided guidance on opting out of sharing public Tumblr content with third parties. However, the process of opting out may require users to navigate through settings on web browsers rather than the Tumblr app, highlighting the need for clear and accessible privacy controls.

Also read: Tumblr’s Comeback, Formerly Twitter

Trend of AI companies accessing public content

Furthermore, the issue of existing data shared with third-party partners has come into focus. Andrew Spittle, AI lead at Automattic, assured that efforts would be made to notify and request the removal of data from partners in line with user preferences. This commitment to ongoing dialogue and content removal reflects the evolving landscape of data privacy and user empowerment.

This development also sheds light on the broader trend of AI companies seeking access to public content for training purposes. With OpenAI’s pursuit of licensing news stories from reputable sources and Reddit’s collaboration with Google for content monetization, the commercialization of public data sets is becoming increasingly prevalent. As technology companies explore new avenues for data utilization, the implications for user privacy and control over their content remain paramount.