Institution Profiling / Internet infrastructure institution

Who is selling your data to train AI?

Who is selling your data to train AI? is tracked as a internet infrastructure institution within the internet infrastructure ecosystem.

Who is selling your data to train AI?
Caption: Who is selling your data to train AI? · Source context: featured article image · Relevance reason: visual context for Who is selling your data to train AI? · Image provenance: BTW media library

Sources

Public references used for this article.

CategoryInstitution

Who is selling your data to train AI? is tracked as a internet infrastructure institution within the internet infrastructure ecosystem.

RegionGlobal

Who is selling your data to train AI? has public-source relevance to network operations, governance, dependency mapping, or market structure.

Signal FocusInternet infrastructure institution

Who is selling your data to train AI? has public-source relevance to network operations, governance, dependency mapping, or market structure.

Content TypeProfile

Who is selling your data to train AI? is tracked as a internet infrastructure institution within the internet infrastructure ecosystem.

Primary DomainTechnology

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

TopicInternet infrastructure institution

Who is selling your data to train AI? is profiled by BTW Media because published evidence links it to internet infrastructure, governance, operational dependencies, or market visibility.

ImpactMedium

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

Confidence?Confidence Grade
0.90–1.00AHigh — direct sources
0.75–0.89A/BStrong
0.55–0.74B/CMedium
0.35–0.54C/DWeak–medium
0.10–0.34DWeak signal
0.00–0.09DInternal monitoring
Limited confidence (72%)

Several public sources

Who is selling your data to train AI? is profiled by BTW Media because published evidence links it to internet infrastructure, governance, operational dependencies, or market visibility.

  • Tumblr and WordPress.com are currently in discussions to provide user data to AI firms like OpenAI and Midjourney.
  • The New York Times is currently suing OpenAI for allegedly using its expansive archives without permission to train chatbots

The use of scraped data from the internet has become a contentious issue, with companies harnessing public content to train their powerful generative models. This practice has sparked legal battles, as organizations like The New York Times and Getty Images have raised concerns about the unauthorized use of their content.

Legal battles over data usage

One of the prominent cases involves OpenAI, which is currently facing a lawsuit from The New York Times for allegedly utilizing the newspaper’s archives without permission to train chatbots. In response, OpenAI has accused The Times of resorting to questionable tactics to prove its claims. Similarly, Getty Images has taken legal action against Stable Diffusion for copyright infringement related to the use of its visual content.

The implications of AI systems leveraging the work of journalists, musicians, and photographers extend beyond legal disputes. The quest for vast amounts of training data has led to concerns about the potential exploitation of online content creators. Platforms like Tumblr and WordPress.com have reportedly been in talks to sell user data to AI companies like OpenAI and Midjourney, raising questions about data privacy and ownership.

Also read: Google’s Bard chatbot gets the Gemini Pro update globally

Partnerships in data sharing

While some entities have opted for litigation, others have chosen to forge partnerships. The Associated Press has licensed a portion of its archives to OpenAI, while Shutterstock inked a six-year deal with the AI company to provide access to its extensive library of photos, videos, and music.

Reddit, known for its wealth of user-generated content, recently struck a deal with Google, granting the tech giant access to its API for AI model training. This move underscores the value of user contributions to platforms and the ethical considerations surrounding data usage.

Also read: OpenAI launches GPT Store for personal AI chatbots without coding

Widespread data training practices

The widespread practice of training AI models on public internet data transcends specific deals highlighted in the article. A recent investigation by The Washington Post uncovered a trove of scraped data from various sources, including online forums, crowdfunding platforms, and social media sites. Companies like Meta, formerly Facebook, have also leveraged public posts from their platforms to enhance AI capabilities.

The debate over data ownership and consent remains unresolved. Content creators, whether on niche blogs or popular social media platforms, face the prospect of their work being commodified for AI training purposes. The balance between innovation and ethical data practices is crucial in shaping the future of AI development and its impact on digital ecosystems.

At A Glance

  • Name: Who is selling your data to train AI?
  • Type: Internet infrastructure institution
  • Base: Global
  • Profile focus: Institution

What It Does

  • Public records support monitoring of its role, services, and key relationships.

Why It Matters

  • Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.
  • Operational criticality: Medium
  • Time horizon: Next quarter

What To Watch

  • Monitoring focuses on verified service continuity, governance changes, and relationship signals.
NowMedium priority

Track verified source updates, role changes, and current public evidence.

QuarterMedium policy sensitivity

Public-source signals support medium-impact monitoring for infrastructure visibility and dependency analysis.

YearNext quarter outlook

Longer-term relevance depends on verified operating, policy, and relationship changes.

Member Briefing

Deeper Profile Context

Login is required to unlock the full profile briefing and source notes.

Only for Strategy Circle

Strategic Circle Access

Open to all readers. Unlock profile briefings after joining and logging in.

Join Strategic Circle

Only for Leadership Alliance

Leadership Alliance Access

For owners and management of IP-holding companies. Login required to unlock.

Join Leadership Alliance
← BackAll Companies