Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
N

nothacking@infosec.exchange

@nothacking@infosec.exchange
About
Indlæg
3
Emner
0
Fremhævelser
0
Grupper
0
Følgere
0
Følger
0

Vis Original

Indlæg

Seneste Bedste Controversial

  • Do people actually appreciate URL shorteners?
    N nothacking@infosec.exchange

    @hpod16 No, it just adds another failure point. If you have annoyingly long URLs like:

    example.com/cgi-bin/pages/2025/02/13/posts/pages.php?article=rocks&format=html

    ... you should fix that. If you absolutely must, just set a short link on the same server:

    Redirect example.com/rocks -> example.com/cgi-bin/pages/2025/02/13/posts/pages.php?article=rocks&format=html

    Ikke-kategoriseret

  • Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
    N nothacking@infosec.exchange

    @bertkoor Well, the advantage of sending junk is it makes crawlers trivially identifiable. That avoids the need for tricks like these:

    > Other user-agents (hopefully all human!) get a cookie-check. e.g. Chrome, Safari, Firefox.

    That still increases loading time. Even if the "CAPTCHA" is small, it'll still take several round trips to deliver.

    ... of course once they've been feed poisoned URLs, they you can start blocking.

    Ikke-kategoriseret

  • Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
    N nothacking@infosec.exchange

    @alexskunz @david_chisnall

    The thing is, you don't a CAPTCHA. Just three if statements on the server will do it:

    1. If the user agent is chrome, but it didn't send a "Sec-Ch-Ua" header: Send garbage.
    2. If the user agent is a known scraper ("GPTBot", etc): Send garbage.
    3. If the URL is one we generated: Send garbage.
    4. Otherwise, serve the page.

    The trick is that instead of blocking them, serve them randomly generated garbage pages.

    Each of these pages includes links that will always return garbage. Once these get into the bot's crawler queue, they will be identifiable regardless of how well they hide themselves.

    I use this on my site: after a few months, it's 100% effective. Every single scraper request is being blocked. At this point, I could ratelimit the generated URLs, but I enjoy sending them unhinged junk. (... and it's actually cheaper then serving static files!)

    This won't do anything about vuln scanners and other non-crawler bots, but those are easy enough to filter out anyway. (URL starts with /wp/?)

    Ikke-kategoriseret
  • Log ind

  • Har du ikke en konto? Tilmeld

  • Login or register to search.
Powered by NodeBB Contributors
Graciously hosted by data.coop
  • First post
    Last post
0
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper