My timeline (which contains a lot of project leaders/sysadmins from big projects) is filling with posts about a new, ongoing wave of what most likely are scrapers collecting training data for „AI“ companies.
-
@algernon @grumpybozo @alan @jwildeboer Did you try to use that trick ? What tool did you used and did it works well ?
@b_b @grumpybozo @alan @jwildeboer I've been using this trick (+ a few tweaks) for about a year now, with iocaine, with great success.
-
@b_b @grumpybozo @alan @jwildeboer I've been using this trick (+ a few tweaks) for about a year now, with iocaine, with great success.
@algernon A wonderful understatement. Perfect answer
@b_b @grumpybozo @alan -
Apparently many ”smart” TV manufacturers ship proxy SDKs from companies like Bright, and they turn the TVs into nodes in a botnet that is used for ”AI” data scraping, so the traffic comes from all over the place.
I’d guess not many consumers know about it, let alone have the technical know-how to prevent it.
@rytmis @agturcz @alan @jwildeboer omg this is a monetization angle for TVs that is just so obvious when you consider the race to the bottom in that industry
-
@rytmis @agturcz @alan @jwildeboer omg this is a monetization angle for TVs that is just so obvious when you consider the race to the bottom in that industry
@Profpatsch @agturcz @alan @jwildeboer
Yep. I just read about it some weeks back and immediately tried to look for dumb TVs as an alternative. Of course, they don’t really exist as a product category any more, so the next best thing was to block those things at the router.
️ -
My timeline (which contains a lot of project leaders/sysadmins from big projects) is filling with posts about a new, ongoing wave of what most likely are scrapers collecting training data for „AI“ companies. They seem to be using botnets (or what some call „residential IP proxies“ to make it sound a bit more legitimate) with millions of IP addresses, making it really hard to defend against. Some have decided to take their sites down until this is over. This is now the world we live in

@jwildeboer Just this week a repository in my Forgejo instance was under attack. In a day, I racked up over 130k distinct IPs with fail2ban and had to abandon that approach.
I now have a simple trick that cut out practically all of the traffic, but I hesitate to share it as it's not difficult to work around… I wish we didn't have to resort to such things.
-
J jwcph@helvede.net shared this topic