Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-dataAs an aside, Microsoft had a publicly reported security incident a year or so ago where petabytes of data was left in a public Azure Storage Blob.
What they didn't say - that petabytes of data was customer photos of animals they'd classified and taken for AI work, t'was some grads just exporting stuff. Good job everybody is preaching about Responsible AI(tm).
-
@GossiTheDog @scottgal they say they're not training on it, it was detected before training. But that's not the point. Amazon got the stuff from somewhere, and a decent person would report where it came from so that the rozzers can trace it back upstream. I flat out don't believe Amazon's claim to not know where it came from, they must know, because they must have got copyright clearance for making a derivative work from all that content

@DrHyde @GossiTheDog Oh yeah I get that, sorry. I don't understand the ramifications of their possession, the originator's (presumably continued possession) of now identified CSAM material...which means they would be legally required to remove and report the user.
NO IDEA how they wouldn't have ANY moral qualms about NOT doing that nevermind what should be OBVIOUS legal liability (but corps are 'special' etc...)! -
@scottgal that doesn't mean using child sexual abuse material images to train AI is okay.
@GossiTheDog BUT certain types of AI it would be obviously. THOSE need to exist in a regulated way and made open source. Like current PII scrubbing models it's a public good but I don't know any commercial company who COULD do it. Orthogonal sorry but just occurred to me...how do you get those models?
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-dataIn my country, the abbreviation CP only means cerebral palsy.
Med andra ord är GenAI-branschen fullständigt CP-skadad.
RE: https://cyberplace.social/@GossiTheDog/115978385132170439 -
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog
Another headline here might be "Amazon admits in public to possessing a huge volume of child pornography". -
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-dataWhat? Hand curation of trillions of issues didn't work?
I'm shocked ayes tell ya, shocked!
-
As an aside, Microsoft had a publicly reported security incident a year or so ago where petabytes of data was left in a public Azure Storage Blob.
What they didn't say - that petabytes of data was customer photos of animals they'd classified and taken for AI work, t'was some grads just exporting stuff. Good job everybody is preaching about Responsible AI(tm).
@GossiTheDog I would expect that they harvest open (no auth, indexable) S3 buckets for AI training.
And you probably know what you find there ....
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog@cyberplace.social Sounds like police should be arresting and charging people at Amazon, then.
-
@GossiTheDog I would expect that they harvest open (no auth, indexable) S3 buckets for AI training.
And you probably know what you find there ....
@masek @GossiTheDog But have they plundered Amazon S3 customer data, that the customers had set as private ?
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-datawell there's your Epstein files right there!
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog I’m starting to worry that these insanely powerful black box systems have some flaws
-
@masek @GossiTheDog But have they plundered Amazon S3 customer data, that the customers had set as private ?
@imbrium_photography I would not rule it out. But there is already plenty "not set private but really private" data in open S3 buckets.
A colleague once found the financial data on a large part of a country in such bucket (plus a copy from their ID card.
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog Cool. If you continue to buy from Amazon, read off Kindle, buy from Whole Foods, and obtain AWS certifications, among other Amazon-owned things, YOU ARE SUPPORTING PEDOPHILIA AND PEDOPHILES!
-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog Sounds very illegal to me, knowing of a crime and keeping info from the law (who this concerns, not some vague ‘regulators’)
-
@GossiTheDog this sounds pretty unbelievable tbh. LAION having "thousands" was a big public thing forcing re-release of the dataset. Others just piling on after this was discovered with no detection algorithms having been used??
Amazon should really publish this information.
@troed @GossiTheDog plot twist of the year would be if the "dataset" they're talking about turned out to be "any image file uploaded to an S3 bucket between 2022 and today"

-
@troed @GossiTheDog plot twist of the year would be if the "dataset" they're talking about turned out to be "any image file uploaded to an S3 bucket between 2022 and today"

-
Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.
If you're using generative AI tools, there's a pretty good chance you're generating imagery with child porn training data behind the scenes.
https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data@GossiTheDog I didn't have "CSAM at scale is unavoidable" on my 2026 bingo card.
-
@masek @GossiTheDog But have they plundered Amazon S3 customer data, that the customers had set as private ?
@imbrium_photography @masek @GossiTheDog - I like the word that you have used: "Plundered" Private Data that was set to privacy.
-
@GossiTheDog @scottgal they say they're not training on it, it was detected before training. But that's not the point. Amazon got the stuff from somewhere, and a decent person would report where it came from so that the rozzers can trace it back upstream. I flat out don't believe Amazon's claim to not know where it came from, they must know, because they must have got copyright clearance for making a derivative work from all that content

@DrHyde @GossiTheDog @scottgal - Or Plundered Data.
-
@GossiTheDog BUT certain types of AI it would be obviously. THOSE need to exist in a regulated way and made open source. Like current PII scrubbing models it's a public good but I don't know any commercial company who COULD do it. Orthogonal sorry but just occurred to me...how do you get those models?
