- OpenAI and Google used the speech recognition tool Whisper to transcribe more than 1 million YouTube videos to train their AI models.
- OpenAI’s use of YouTube videos may violate Google’s rules, which prohibit the use of its videos for standalone applications as well as access through automated means.
Both OpenAI and Google have turned to transcribing YouTube videos to further train their AI models, which could infringe on creators’ copyrights. The two tech giants cut corners on Meta to get as much data as possible to train their AI models.
Infringement of creator’s video copyright
OpenAI used Whisper to transcribe over a million hours of YouTube video, feeding the transcripts into GPT-4, the AI system used for the ChatGPT chatbot. Google, which owns YouTube, also transcribed videos for AI model training.
The transcriptions of videos by both businesses can violate the copyrights of the original producers. Lawsuits related to copyright and licence have resulted from other uses of creative content for AI training.
OpenAI’s use of YouTube videos may also violate Google’s rules prohibiting the use of its videos for “independent” applications and “automated means (such as bots, botnets, or scrapers)” of accessing its videos.
Also read:Google and Stanford researchers launch AI fact-checking tool
Also read:Google DeepMind CEO Demis Hassabis receives knighthood for AI technology
Allow the use of AI using public data
The New York Times was informed by Google spokesperson Matt Bryant that the business was unaware of any such usage by OpenAI. Google employees were aware of OpenAI’s unlawful usage of YouTube content, but they chose not to intervene as it was acting in a similar manner. Additionally, Google informed the newspaper that it only used content whose creators had consented to this kind of usage of their videos to teach AI. 2023 Google modified its terms of service in July to permit the use of content that is freely accessible online, such as Google Docs and restaurant ratings on Google Maps, for the purpose of further training AI models.






