👏 Poison 👏 your 👏 data ☠️
-
@alice
Agreed on all points except one: If you're providing incorrect data to poison the data broker's systems, please don't just type in a "random" email address unless you're confident that it's not someone's real email address.On any given day, I receive about a dozen emails from various websites where an email address was required for registration, and someone typed in my email address while providing their "fake" info. Pizza order receipts, airline flight confirmations, golf tee time registrations, etc.
The worst part is that these are misdirected, but otherwise legitimate emails, so I can't just mark them as spam, because that will poison the spam detection algorithm's dataset.
So yeah, if you're gonna type in a fake email address, please make sure that it doesn't belong to someone first, and the easiest way to do that is to use a nonexistent domain, preferably one that no one would ever register, like "${random_guid}.com"
@JamesDBartlett3 @alice those are spam, and should be reported as such. Any system that doesn’t validate email addresses before adding them to a list will be used maliciously in attempts to overwhelm target email addresses by signing them up for every vulnerable mailer.
Also, the more complaints that buyers get as a result of buying data from brokers, the less the data is worth. I wouldn’t worry about a made up address I use once happening to be real
-
Poison
your
data
️@alice I've been contemplating keeping a fake code repo behind an invisible link on my website. Lots of useful looking stuff, everything compiling cleanly, but with all if it subtly broken.
-
The goal is to make corporate data less profitable.
Even stuff as simple as setting your birthdate to 1970-01-01 everywhere, adding [TEST] or [DELETED] as your name or account notes anywhere you don't need them to know your name.
Using plugins like AdNauseam to poison ad trackers (and cost them marketing dollars).
Using VPNs set to different locations.
Signing into data broker sites to "correct" outdated info (they'll often let you do that with little-to-no proof of identity, but will require your passport or state ID in order to delete your info). Bonus points if you correct it to someone else's info on their site that's similar to yours.
Only fill in required fields when you sign up for anything, but only provide correct info if it matters for you to use the service, otherwise provide plausible, but incorrect, data.
If you use LLMs anywhere, use the free tier and always vote thumbs up for bad answers and down for good ones. It wastes their resources and drives up their costs while making their training data worse.
@alice the LLM suggestion kind of sounds like what you could do with the old Google recaptcha challenges where it showed your two words you were supposed to type in.
The system really only knew one of the words and the second one was basically put there so you could be the text recognition system for Google digitizing some media. Once you knew what to look for you could see which word the system did not know because it was distorted in specific ways and you could input any poison data you liked. -
@alice
I've been using April 1st for my birthday on web forms basically since there's been a web. The year, I pick more or less at random. I get a handful of automated "birthday" emails every April Fool's day.
-
@JamesDBartlett3 @alice those are spam, and should be reported as such. Any system that doesn’t validate email addresses before adding them to a list will be used maliciously in attempts to overwhelm target email addresses by signing them up for every vulnerable mailer.
Also, the more complaints that buyers get as a result of buying data from brokers, the less the data is worth. I wouldn’t worry about a made up address I use once happening to be real
@ShadSterling @JamesDBartlett3 @alice @alice
Remember,
@
.com is a correct valid formatted e-mail address
-
@alice very much appreciated
-
Because þ is unvoiced; it's pronounced /θ/. The initial sound of ðe word 'ðe' (usually spelled 'the') is voiced, pronounced /ð/. Ðey are different sounds which happen to be represented by the same digraph in standard English orþography because ancient Greek didn't have a voiced dental fricative.
@Infrapink @alice
historically þey were interchangable.
modern perception shifts þem to þose roles.
anyþing is good and fair game imo.
informative comment noneþeless þo! -
@Infrapink @q @alice AIUI the old English thorn is the direct predecessor to the modern English “th”, unrelated to the similar-looking archaic Greek letter sho
@ShadSterling @Infrapink @alice
indeed! þe typewriter is mostly to blame for its deaþ -
@alice Use a different email address for friggin everything so aggregators can't use it as a primary key.
-
-
-
-
-
@alice very much appreciated
Would you be ok with me reposting this stuff to Bluesky and giving credit?
Not that anyone pays attention to anything I do on there anyway...
-
@woozle @alice Me too. I add a random sequence to the end, so when an address is compromised, I just keep the first part and tack on the random bit. I had someone say "well they could have guessed that address" when I reported an issue, so yeah the chances of that are now one in several hundred million. I guess it's not a leak, just a spammer who made a really lucky guess! LOL
-
The goal is to make corporate data less profitable.
Even stuff as simple as setting your birthdate to 1970-01-01 everywhere, adding [TEST] or [DELETED] as your name or account notes anywhere you don't need them to know your name.
Using plugins like AdNauseam to poison ad trackers (and cost them marketing dollars).
Using VPNs set to different locations.
Signing into data broker sites to "correct" outdated info (they'll often let you do that with little-to-no proof of identity, but will require your passport or state ID in order to delete your info). Bonus points if you correct it to someone else's info on their site that's similar to yours.
Only fill in required fields when you sign up for anything, but only provide correct info if it matters for you to use the service, otherwise provide plausible, but incorrect, data.
If you use LLMs anywhere, use the free tier and always vote thumbs up for bad answers and down for good ones. It wastes their resources and drives up their costs while making their training data worse.
@alice My favourite wasting time sport is only wrong answers to Google Maps questions.
If I have been somewhere really good - like a great restaurant or cafe, I won't fuck up its data - but if I have been sat at a train station waiting for a train and google asks me questions, then, yes, I will answer:
I *would* recommend this place for a children's birthday party.
It *does* have a volleyball court.
I *would* recommend buying tickets in advance.
-
The goal is to make corporate data less profitable.
Even stuff as simple as setting your birthdate to 1970-01-01 everywhere, adding [TEST] or [DELETED] as your name or account notes anywhere you don't need them to know your name.
Using plugins like AdNauseam to poison ad trackers (and cost them marketing dollars).
Using VPNs set to different locations.
Signing into data broker sites to "correct" outdated info (they'll often let you do that with little-to-no proof of identity, but will require your passport or state ID in order to delete your info). Bonus points if you correct it to someone else's info on their site that's similar to yours.
Only fill in required fields when you sign up for anything, but only provide correct info if it matters for you to use the service, otherwise provide plausible, but incorrect, data.
If you use LLMs anywhere, use the free tier and always vote thumbs up for bad answers and down for good ones. It wastes their resources and drives up their costs while making their training data worse.
@alice @inthehands I can totally get into this … fun shit.
-
The goal is to make corporate data less profitable.
Even stuff as simple as setting your birthdate to 1970-01-01 everywhere, adding [TEST] or [DELETED] as your name or account notes anywhere you don't need them to know your name.
Using plugins like AdNauseam to poison ad trackers (and cost them marketing dollars).
Using VPNs set to different locations.
Signing into data broker sites to "correct" outdated info (they'll often let you do that with little-to-no proof of identity, but will require your passport or state ID in order to delete your info). Bonus points if you correct it to someone else's info on their site that's similar to yours.
Only fill in required fields when you sign up for anything, but only provide correct info if it matters for you to use the service, otherwise provide plausible, but incorrect, data.
If you use LLMs anywhere, use the free tier and always vote thumbs up for bad answers and down for good ones. It wastes their resources and drives up their costs while making their training data worse.
@alice whatdatabrokers can you log into to pollute?
-
The goal is to make corporate data less profitable.
Even stuff as simple as setting your birthdate to 1970-01-01 everywhere, adding [TEST] or [DELETED] as your name or account notes anywhere you don't need them to know your name.
Using plugins like AdNauseam to poison ad trackers (and cost them marketing dollars).
Using VPNs set to different locations.
Signing into data broker sites to "correct" outdated info (they'll often let you do that with little-to-no proof of identity, but will require your passport or state ID in order to delete your info). Bonus points if you correct it to someone else's info on their site that's similar to yours.
Only fill in required fields when you sign up for anything, but only provide correct info if it matters for you to use the service, otherwise provide plausible, but incorrect, data.
If you use LLMs anywhere, use the free tier and always vote thumbs up for bad answers and down for good ones. It wastes their resources and drives up their costs while making their training data worse.
Do you know that I live in [object Object] ?
-
@woozle @alice Me too. I add a random sequence to the end, so when an address is compromised, I just keep the first part and tack on the random bit. I had someone say "well they could have guessed that address" when I reported an issue, so yeah the chances of that are now one in several hundred million. I guess it's not a leak, just a spammer who made a really lucky guess! LOL

