Amazon have reported "hundreds of thousands" of pictures of child sexual abuse material found in shared AI training data... but is refusing to tell regulators which data sets.

gossithedog@cyberplace.social

@scottgal that doesn't mean using child sexual abuse material images to train AI is okay.

troed@swecyb.com · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog this sounds pretty unbelievable tbh. LAION having "thousands" was a big public thing forcing re-release of the dataset. Others just piling on after this was discovered with no detection algorithms having been used??

Amazon should really publish this information.

https://petapixel.com/2024/09/03/major-ai-image-dataset-is-back-online-after-being-pulled-over-csam-laion-5b/

cigitalgem@sigmoid.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog reminder that recursive pollution remains a HUGE open problem with ML models.

https://berryvilleiml.com/2026/01/10/recursive-pollution-and-model-collapse-are-not-the-same/

drhyde@fosstodon.org

@GossiTheDog @scottgal they say they're not training on it, it was detected before training. But that's not the point. Amazon got the stuff from somewhere, and a decent person would report where it came from so that the rozzers can trace it back upstream. I flat out don't believe Amazon's claim to not know where it came from, they must know, because they must have got copyright clearance for making a derivative work from all that content

aqunt@piipitin.fi · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog Can’t read the article so this is speculation: Amazon admitted having lots of CSAM but refuses to tell where they downloaded from? I thought holding on to CSAM is a crime in self, but as usual rules do not apply to big tech. And where did the material came from? Secret access to customer data they refuse to disclose?

carpetbomberz@mastodon.online · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog AI = CSAM

jonly@mastodon.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog wasnt that confirmed to be the case years ago when all this ai bullshit started.
Like even if you just scrape the clear web youll likely scrape some of that shit

thirstybear@agilodon.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog Non-paywall version here

https://archive.is/20260129113044/https://www.bloomberg.com/news/features/2026-01-29/amazon-found-child-sex-abuse-in-ai-training-data

gossithedog@cyberplace.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

As an aside, Microsoft had a publicly reported security incident a year or so ago where petabytes of data was left in a public Azure Storage Blob.

What they didn't say - that petabytes of data was customer photos of animals they'd classified and taken for AI work, t'was some grads just exporting stuff. Good job everybody is preaching about Responsible AI(tm).

scottgal@hachyderm.io

@DrHyde @GossiTheDog Oh yeah I get that, sorry. I don't understand the ramifications of their possession, the originator's (presumably continued possession) of now identified CSAM material...which means they would be legally required to remove and report the user.
NO IDEA how they wouldn't have ANY moral qualms about NOT doing that nevermind what should be OBVIOUS legal liability (but corps are 'special' etc...)!

scottgal@hachyderm.io

@GossiTheDog BUT certain types of AI it would be obviously. THOSE need to exist in a regulated way and made open source. Like current PII scrubbing models it's a public good but I don't know any commercial company who COULD do it. Orthogonal sorry but just occurred to me...how do you get those models?

moses_izumi@fe.disroot.org · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

In my country, the abbreviation CP only means cerebral palsy.

Med andra ord är GenAI-branschen fullständigt CP-skadad.

RE: https://cyberplace.social/@GossiTheDog/115978385132170439

mrundkvist@archaeo.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog
Another headline here might be "Amazon admits in public to possessing a huge volume of child pornography".

jmcrookston@mastodon.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog

What? Hand curation of trillions of issues didn't work?

I'm shocked ayes tell ya, shocked!

masek@infosec.exchange

@GossiTheDog I would expect that they harvest open (no auth, indexable) S3 buckets for AI training.

And you probably know what you find there ....

driusan@doomscroller.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog@cyberplace.social Sounds like police should be arresting and charging people at Amazon, then.

imbrium_photography@mastodon.social

@masek @GossiTheDog But have they plundered Amazon S3 customer data, that the customers had set as private ?

sassinake@mastodon.social · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog

well there's your Epstein files right there!

lyle@cville.online · 26-01-29/amazon-found-child-sex-abuse-in-ai-training-data

@GossiTheDog I’m starting to worry that these insanely powerful black box systems have some flaws

masek@infosec.exchange

@imbrium_photography I would not rule it out. But there is already plenty "not set private but really private" data in open S3 buckets.

A colleague once found the financial data on a large part of a country in such bucket (plus a copy from their ID card.