To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
-
@osm_tech do you want to share this IP-List?
@michel42 We'd like to share the IP address list, but unfortunately don't think we can due to legal concerns.
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
-
@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
@HunterZ @osm_tech this is actually quite common. Mobile advertising SDKs for games, background apps, etc include residential scraping proxy functionality that they can sell to the highest bidder, and then when scrapers want to avoid restrictions they can pay a fraction of a penny to send their requests via your phone. Millions of people use apps with this built in and have no idea. Most websites don't want to ban the residential scrapers because it can hurt growth.
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech how can you even scrape a mapsite? isnt osm just a big canvas like google maps that is also nearly impossible to automate?
-
@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
-
@osm_tech how can you even scrape a mapsite? isnt osm just a big canvas like google maps that is also nearly impossible to automate?
@utf_7 It is madness, start here: https://www.openstreetmap.org/node/1 and keep going once you reach https://www.openstreetmap.org/node/10000000000, then start on ways, and relations
or just download the latest weekly export from planet.openstreetmap.org 
-
@utf_7 It is madness, start here: https://www.openstreetmap.org/node/1 and keep going once you reach https://www.openstreetmap.org/node/10000000000, then start on ways, and relations
or just download the latest weekly export from planet.openstreetmap.org 
@osm_tech uff, i am a noob so forgive my stupid question: cant you somehow limit the requests. like 10 requests per minute or so. normal users will not be affected and scrapers will take forever?
-
@HunterZ @osm_tech this is actually quite common. Mobile advertising SDKs for games, background apps, etc include residential scraping proxy functionality that they can sell to the highest bidder, and then when scrapers want to avoid restrictions they can pay a fraction of a penny to send their requests via your phone. Millions of people use apps with this built in and have no idea. Most websites don't want to ban the residential scrapers because it can hurt growth.
@ryanprior @HunterZ @osm_tech I had no idea this was a thing. And presumably, as requests come from you, not the advertiser, Pihole (and other network blockers) treat it as legitimate traffic?
-
@osm_tech uff, i am a noob so forgive my stupid question: cant you somehow limit the requests. like 10 requests per minute or so. normal users will not be affected and scrapers will take forever?
@utf_7 We've had 400,000 IPs in the last 24 hours. Each IP only does a few requests. Technically we're managing, but no fun fighting this daily rather than building new things.
-
@ryanprior @HunterZ @osm_tech I had no idea this was a thing. And presumably, as requests come from you, not the advertiser, Pihole (and other network blockers) treat it as legitimate traffic?
-
@ryanprior @HunterZ @osm_tech I had no idea this was a thing. And presumably, as requests come from you, not the advertiser, Pihole (and other network blockers) treat it as legitimate traffic?
@tehstu @ryanprior @osm_tech pihole works by refusing to provide DNS resolution for domains on its blocklists, so it could block a scraper *if* its functionality depends on resolving a domain name that is blocked by pihole.
-
@tehstu @ryanprior @osm_tech pihole works by refusing to provide DNS resolution for domains on its blocklists, so it could block a scraper *if* its functionality depends on resolving a domain name that is blocked by pihole.
@tehstu @ryanprior @osm_tech oh and of course the scraper would have to respect pihole versus using its own hard coded DNS IP to resolve things.
-
@utf_7 We've had 400,000 IPs in the last 24 hours. Each IP only does a few requests. Technically we're managing, but no fun fighting this daily rather than building new things.
@osm_tech tHeN yOu jUsT neEd tO sCaLe
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.
-
@osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.
@JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.
-
@osm_tech tHeN yOu jUsT neEd tO sCaLe
@utf_7 In this economy with RAM prices what they are?!?

-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech@en.osm.town
Could something like Anubis help you guys? -
J jwcph@helvede.net shared this topic