Apple AI models trained with YouTube content of MrBeast, MKBHD, PewDiePie and others without permission
5 months ago | 53 Views
Apple, Nvidia, Salesforce and few of the other big tech companies across the world have been accused of training their AI models through YouTube videos of famous creators. As per a report by Wired, the tech giants fed subtitle files downloaded by a non-profit company from over 1,70,000 videos of popular creators including MrBeast, Marques Brownlee (MKBHD), PewDiePie, John Oliver, and Jimmy Kimmel and others, without their consent. For those who don't know, the subtitle files are effectively transcripts of the video content. While many may think of it as violation of privacy and YouTube's rules, it is also a major concern of potential copyright violation.
How Apple, Nvidia got the data
The report claims that an investigation by Proof News revealed that several tech giants have used subtitles of thousands of videos on YouTube to train AI. Although YouTube did have a policy that doesn't allow anyone to harvest materials from their platform without permissions. However, the big tech players reportedly sourced the data from EleutherAI, a platform that claims to help small developers and academics to train AI models. It appears that the data extracted by EleutherAI has also been used by companies such as Apple and Nvidia.
Research paper by EleutherAI reveals that their datasets, called the Pile, are open and accessible to anyone with enough computing power and space to access them. The research paper and posts from big tech companies also reflect how these firms valued in hundreds of billions and trillions of dollars, used Pile to train AI. Documents also shed light on Apple using EleutherAI's Pile to train its high-profile model called OpenELM which debuted in April.
Is Apple responsible for the violation?
It is worth noting that YouTube's terms and conditions have not been broken by Apple, but by EleutherAI who sourced the data from Google-owned video streaming platform and spread it to numerous developers via Pile. This is not the first example where data has been sourced illegally to train AI systems. One can often spot AI chatbots providing information while plagiarizing entire text when asked for information about niche topics.
Read Also: high electricity bills scaring you? 4 gadgets from qubo, wipro, ohm to help save money
#