Trends

Tech giants accused of using unauthorised YouTube transcripts to train AI models

OUR TAKEThe development of AI technology is certainly promising, but its creation and advancement are built on databases. The lack of transparency in these databases is bound to cause controversy. The affected parties and the infringing companies often hold conflicting views, with no definitive reso…

July-17-AI-news

Headline

OUR TAKEThe development of AI technology is certainly promising, but its creation and advancement are built on databases. The lack of transparency in these databases is bound to cause controversy. The affected parties and the infringing companies often hold conflicting views,…

Context

OUR TAKE The development of AI technology is certainly promising, but its creation and advancement are built on databases. The lack of transparency in these databases is bound to cause controversy. The affected parties and the infringing companies often hold conflicting views, with no definitive resolution in sight. This situation is like a Damocles sword hanging over the industry; if not addressed, it will inevitably hinder the continuous development of AI. — Yasmine luo, BTW reporter Some major tech companies are accused of using YouTube transcripts without authorization to train their AI models.

Evidence

Pending intelligence enrichment.

Analysis

According to Proof News , EleutherAI , a nonprofit organisation, created a dataset containing transcripts from over 48,000 YouTube channels, including content from prominent creators like Marques Brownlee and MrBeast, as well as major publishers like The New York Times , the BBC , and ABC News . According to a new investigation by Proof News, Apple , NVIDIA , Anthropic , and other large tech companies used this dataset to train their AI models. Neal Mohan, CEO of YouTube, has previously stated, “Companies using YouTube’s data to train AI models would violate the platform’s terms of service.” Marques Brownlee, a famous YouTuber, posted on social media, “Apple has sourced data for their AI from several companies. One of them scraped tons of data/transcripts from YouTube videos, including mine. Apple technically avoids ‘fault’ here because they’re not the ones scraping. But this is going to be an evolving problem for a long time.” Currently, Apple, NVIDIA, Anthropic, and EleutherAI have not commented on the matter.

Key Points

  • Some of the Tech giants allegedly used YouTube transcripts without permission to train AI models.
  • The legality of using unauthorised databases to train AI is undetermined, potentially hindering future AI development.

Actions

Pending intelligence enrichment.

Author

Yasmine Luo (y.luo@btw.media)· author profile pending