Under Attack, Please Stand By.
-
Under Attack, Please Stand By.
My server is getting absolutely obliterated by AI scrapers today. Load is 140+ and I just manually banned 58,000 IP addresses.
Top ten user agents: ...
https://jwz.org/b/ykqJ -
Under Attack, Please Stand By.
My server is getting absolutely obliterated by AI scrapers today. Load is 140+ and I just manually banned 58,000 IP addresses.
Top ten user agents: ...
https://jwz.org/b/ykqJHuh, why is my / disk full?
Oh:
-rw-------. 1 root root 1104886787 Aug 23 12:29 /var/log/php-fpm/error.logThat's just 6 days worth of AI-scraping bots.
-
P pelle@veganism.social shared this topic
-
Under Attack, Please Stand By.
My server is getting absolutely obliterated by AI scrapers today. Load is 140+ and I just manually banned 58,000 IP addresses.
Top ten user agents: ...
https://jwz.org/b/ykqJGetting absolutely reamed by AI scrapers again today, and it seemed like my mitigations were failing. Why weren't these being blocked?
Oh. Because I had Facebook's subnets on the fail2ban whitelist, so that people sharing @dnalounge links on the Zuckerweb got link previews.
Welp. You can't whitelist "facebookexternalhit" (the link preview bot) without also whitelisting "meta-externalagent" (the AI scraper, which seems to ignore robots.txt).
I guess link previews are gonna be a casualty.
-
Getting absolutely reamed by AI scrapers again today, and it seemed like my mitigations were failing. Why weren't these being blocked?
Oh. Because I had Facebook's subnets on the fail2ban whitelist, so that people sharing @dnalounge links on the Zuckerweb got link previews.
Welp. You can't whitelist "facebookexternalhit" (the link preview bot) without also whitelisting "meta-externalagent" (the AI scraper, which seems to ignore robots.txt).
I guess link previews are gonna be a casualty.
Welp. I now have proof that Facebook is using the same outbound IP addresses for both A) scraping web sites for "AI" training and B) fetching images from my site when I use the official API to post to Instagram.
This means that I can either defend myself from injuriously voracious AI scrapers, or have a functional business Instagram account, but not both.
Not just the same subnets. The same IPs.
-
Welp. I now have proof that Facebook is using the same outbound IP addresses for both A) scraping web sites for "AI" training and B) fetching images from my site when I use the official API to post to Instagram.
This means that I can either defend myself from injuriously voracious AI scrapers, or have a functional business Instagram account, but not both.
Not just the same subnets. The same IPs.
@jwz plz do to fb requests what you're doing to requests with hn referrer