Home Tech How Tech Giants Used YouTube Data to Train AI: Apple and Salesforce...

How Tech Giants Used YouTube Data to Train AI: Apple and Salesforce Respond

Tech giants Apple, Nvidia, Anthropic, and Salesforce have come under scrutiny in a recent investigation conducted by Proof News and published on Wired. The report alleges that these companies used data from “thousands of YouTube videos” to train their AI models. The dataset, called “YouTube Subtitles,” consists of video transcripts from educational channels like Khan Academy, MIT, Harvard, as well as major news outlets such as the Wall Street Journal, NPR, and the BBC. It also includes material from popular YouTube stars like PewDiePie, Marques Brownlee, and MrBeast.

Apple and Salesforce have responded to Wired’s report. Apple stated that while its open-source language model, OpenELM, did indeed use the dataset, it was solely for research purposes and will not be used in any of Apple’s machine learning-powered hardware or AI services, including Apple Intelligence. Apple emphasized its commitment to benefiting the broader research community through projects like OpenELM.

Apple Intelligence is Apple’s new suite of AI features introduced at WWDC 2024, which includes tools like text summarization and entertainment-focused features like Genmoji and Image Playground. Apple clarified that it offers websites the option to opt out of having their content used for AI training and that its generative models are built using high-quality data from licensed content and publicly available data on the web.

Salesforce also provided its response to the allegations, stating that the dataset referred to in the research paper was used for academic and research purposes in 2021. According to Salesforce, the dataset was publicly available and released under a permissive license.

Nvidia has yet to comment on the matter. The company is known for incorporating AI into its gaming hardware and services.

In conclusion, while the investigation raises concerns about the use of data from YouTube videos by tech giants for training AI models, both Apple and Salesforce maintain that the dataset was used for research purposes only. It is important for companies to be transparent about their data usage and ensure that they adhere to ethical practices.

Exit mobile version