Do stock image creators know they're training AI to compete with them?
By all accounts, making money by creating stock images isn't easy. AI image generation is starting make it a lot tougher. AI-generated images are already popping up in places where stock images are traditionally used. And within the past few weeks, two major suppliers of stock images, Shutterstock and Getty Images, made announcements about partnerships relating to image-generating AI.
If you’ve been under a rock and have missed the AI image-generation craze, these models, also known as text-to-image models, are algorithms that can generate images based on a text prompt, like, say, “an astronaut riding a horse in a photorealistic style” or “a bowl of soup that looks like a monster made out of plasticine.” Well-known examples include OpenAI's DALL-E, Stability AI's Stable Diffusion, and Google's Imagen.
Shutterstock and Getty Images announce plans to integrate AI into their offerings
Shutterstock recently announced they plan to offer access to OpenAI's DALL-E 2 model alongside their existing collection of stock images. They also revealed they previously licensed images to OpenAI for the training of DALL-E. In the same announcement, they pledged to create a “Contributor Fund” to distribute profits from Shutterstock's use of DALL-E to the artists whose work was used to train it.
In the same week, Getty Images announced a more restricted approach. They’ll be offering access to AI image editing tools made by the startup BRIA, but only for manipulating images - not for making them from scratch. So you’ll be able to do things like change a background or the apparent ethnicity or age of a person in a photo. BRIA also relied on “image banks” to train its models. Getty and BRIA have not announced any plans for ongoing reimbursement to the creators of images used to train BRIA’s models.
Using stock images seems like a great alternative to web scraping
Using stock images to train models makes sense. It avoids the copyright risks associated with using web-scraped data. The images are typically high-quality, and, crucially, they have keyword tags created by the artist or photographer to describe what's pictured. The pairing of images with descriptive text is exactly what’s needed to train these text-to-image models.
Using stock images could also be the fairest, most ethical approach to training these data-hungry models. In the best-case scenario, it would allow image creators to give informed consent for the use of their images to train models, and would provide them with fair compensation.
Are these solutions living up to their ethical claims?
Both press releases feature the language of "responsible" AI and "ethical" practices. But it's currently unclear whether they're living up to this potential. We don't know, for example, whether the creators of the stock images licensed by OpenAI and BRIA knew their work could be used in this way - to train AI that would potentially compete against them.
It doesn't appear that Shutterstock informed creators before licensing their work to OpenAI (disclaimer: I'm not a journalist and haven't investigated this beyond what I can find online). The details of BRIA's use of image banks also haven't been publicly disclosed. Even though the BRIA/Getty partnership seems to minimize direct competition with humans by restricting the use of AI to editing, it’s not hard to imagine ways that AI-enabled editing will reduce demand for the skills of human artists. And therefore, artists might reasonably object to having their work used to train BRIA’s models too. And BRIA seems to be planning to offer image generation from scratch, even though it won’t be part of the Getty partnership.
AI training is a novel use for stock imagery, and one that most photographers and artists probably didn't anticipate when they offered work for licensing through these image aggregators. This past week, when creators at DeviantArt found out their work would be opted-in to allow AI training by default, there was a huge uproar – demonstrating that many human creators would prefer not to have their work used to train AI.
Unanswered questions
Although the Shutterstock press release touts their approach as "transparent," many aspects of the situation remain opaque. How much income will Shutterstock's Contributor Fund will provide to contributors? What percentage of total revenue from the DALL-E deployment will go to the Contributor Fund? Which of OpenAI's models have been trained on Shutterstock photos? Will all of those feed into the Contributor Fund, or just the one deployed through Shutterstock? Even less is known about Getty/BRIA's approach to consent and compensation.
We can’t, and shouldn’t, try to stop the progress of AI to protect human jobs. But we should ensure humans are properly informed about, and fairly compensated for, their contributions to the tech that’s displacing them. Using licensed stock images might be the best pathway for achieving that. But with so few details available, it's hard to tell whether the companies mentioned here are living up to their claims about ethics and responsibility.
If you’re a creator of stock images and have been informed, or not informed, about the use of your work in one of these projects, please comment! Or if you work for one of the companies I mentioned and can fill in more details, please do.