Artists criticize Apple's lack of transparency around Apple Intelligence data

2 days ago 2

Later this year, millions of Apple devices will begin running Apple Intelligence, Cupertino's take on generative AI that, among other things, lets people generate images with text prompts. But some members of the creative community are unhappy about what they say is the company’s lack of transparency around the raw information powering the AI model that makes this possible.

“I wish Apple would have explained to the public in a more transparent way how they collected their training data,” Jon Lam, a video games artist and a creators’ rights activist based in Vancouver, told Engadget. “I think their announcement could not have come at a worse time.”

Creatives have historically been some of the most loyal customers of Apple, a company whose founder famously positioned it at the “intersection of technology and liberal arts.” But photographers, concept artists and sculptors who spoke to Engadget said that they were frustrated about Apple’s relative silence around how it gathers data for its AI models.

Generative AI is only as good as the data its models are trained on. To that end, most companies have ingested just about anything they could find on the internet, consent or compensation be damned. Nearly 6 billion images used to train multiple AI models also came from LAION-5B, a dataset of images scraped off the internet. In an interview with Forbes, David Holz, the CEO Midjourney, said that the company’s models were trained on “just a big scrape of the internet” and that “there isn’t really a way to get a hundred million images and know where they’re coming from.”

Artists, authors and musicians have accused generative AI companies of sucking up their work for free and profiting off of it, leading to more than a dozen lawsuits in 2023 alone. Last month, major music labels including Universal and Sony sued AI music generators Suno and Udio, startups valued at hundreds of millions of dollars, for copyright infringement. Tech companies have – ironically – both defended their actions and also struck licensing deals with content providers, including news publishers.

Some creatives thought that Apple might do better. “That’s why I wanted to give them a slight benefit of the doubt,” said Lam. “I thought they would approach the ethics conversation differently.”

Instead, Apple has revealed very little about the source of training data for Apple Intelligence. In a post published on the company’s machine learning research blog, the company wrote that, just like other generative AI companies, it grabs public data from the open web using AppleBot, its purpose-made web crawler, something that its executives have also said on stage. Apple’s AI and machine learning head John Giannandrea also reportedly said that “a large amount of training data was actually created by Apple” but did not go into specifics. And Apple has also reportedly signed deals with Shutterstock and Photobucket to license training images, but hasn’t publicly confirmed those relationships. While Apple Intelligence tries to win kudos for a supposedly more privacy-focused approach using on-device processing and bespoke cloud computing, the fundamentals girding its AI model appear little different from competitors.

Apple did not respond to specific questions from Engadget.

In May, Andrew Leung, a Los Angeles-based artist who has worked on films like Black Panther, The Lion King and Mulan, called generative AI “the greatest heist in the history of human intellect” in his testimony before the California State Assembly about the effects of AI on the entertainment industry. “I want to point out that when they use the term ‘publicly available’ it just doesn’t pass muster,” Leung said in an interview. “It doesn’t automatically translate to fair use.”

It’s also problematic for companies like Apple, said Leung, to only offer an option for people to opt out once they’ve already trained AI models on data that they did not consent to. “We never asked to be a part of it.” Apple does allow websites to opt out of being scraped by AppleBot forApple Intelligence training data – the company says it respects robots.txt, a text file that any website can host to tell crawlers to stay away – but this would be triage at best. It's not clear when AppleBot began scraping the web or how anyone could have opted out before then. And, technologically, it's an open question how or if requests to remove information from generative models can even be honored.

This is a sentiment that even blogs aimed at Apple fanatics are echoing. “It’s disappointing to see Apple muddy an otherwise compelling set of features (some of which I really want to try) with practices that are no better than the rest of the industry,” wrote Federico Viticci, founder and editor-in-chief of Apple enthusiast blog MacStories.

Adam Beane, a Los Angeles-based sculptor who created a likeness of Steve Jobs for Esquire in 2011, has used Apple products exclusively for 25 years. But he said that the company’s unwillingness to be forthright with the source of Apple Intelligence training data has disillusioned him.

"I'm increasingly angry with Apple," he told Engadget. "You have to be informed enough and savvy enough to know how to opt out of training Apple's AI, and then you have to trust a corporation to honor your wishes. Plus, all I can see being offered as an option to opt out is further training their AI with your data."

Karla Ortiz, a San Francisco-based illustrator, is one of the plaintiffs in a 2023 lawsuit against Stability AI and DeviantArt, the companies behind image generation models Stable Diffusion and DreamUp respectively, and Midjourney. “The bottom line is, we know [that] for generative AI to function as is, [it] relies on massive overreach and violations of rights, private and intellectual,” she wrote on a viral X thread about Apple Intelligence. “This is true for all [generative] AI companies, and as Apple pushes this tech down our throats, it’s important to remember they are not an exception.”

The outrage against Apple is also a part of a larger sense of betrayal among creative professionals against tech companies whose tools they depend on to do their jobs. In April, a Bloomberg report revealed that Adobe, which makes Photoshop and multiple other apps used by artists, designers, and photographers, used questionably-sourced images to train Firefly, its own image-generation model that Adobe claimed was “ethically” trained. And earlier this month, the company was forced to update its terms of service to clarify that it wouldn’t use the content of its customers to train generative AI models after customer outrage. “The entire creative community has been betrayed by every single software company we ever trusted,” said Lam. It isn’t feasible for him to switch away from Apple products entirely, he’s trying to cut back — he’s planning to give up his iPhone for a Light Phone III.

“I think there is a growing feeling that Apple is becoming just like the rest of them,” said Beane. “A giant corporation that is prioritizing their bottom line over the lives of the people who use their product.”

This article originally appeared on Engadget at https://www.engadget.com/artists-criticize-apples-lack-of-transparency-around-apple-intelligence-data-131250021.html?src=rss
Read Entire Article