How to performantly bulk remove dormant users from a forum?
-
Hey everyone,
I am trying to clean out our forum from dormant and spam users. We have roughly 60000 accounts (sic!) of which about 56000 are spam accounts with no posts at all.
I have written a small Python script which reaches into our MongoDB database and identifies ‘invalid’ accounts over a handful criteria such as the user having no posts, URLs in the profile of the user and more. And I can quite accurately sort out spam from legit accounts. The problem is when I just delete these documents and their directly related documents (e.g., for
user:100
alsouser:100:emails
,user:100:settings
, …) in the Mongo database, then I end up with an at first glance first glance functional NodeBB instance. But secondary data has not been updated as NodeBB does not seem to be very atomic. The users list on the dummy-forum now has for example countless empty pages, as the users are gone but something has not been updated which feeds that user list. I already rebuilt the forum, but this did not change anything.I also had a look at the WriteAPI. I did not (yet) get the bulk user account deletion to work, but when I use the endpoint
/api/v3/users/{uid}
, my script ends up like this:Processing users: 1%| 320/56329 [11:13<32:23:18, 2.08s/user]
I.e., it takes NodeBB about 2 seconds to delete a single user account. And in total this is then more than a day of processing time. I cannot be the first one with this problem, right? I did not find any solutions to this problem. I also found/nodebb/src/api/users.js:processDeletion
and the lower levelnodebb/src/user/delete.js:User.deleteAccount
, but there is no clear path for me which database documents I have to delete and update.Cheers,
zipit -
Hey everyone,
I am trying to clean out our forum from dormant and spam users. We have roughly 60000 accounts (sic!) of which about 56000 are spam accounts with no posts at all.
I have written a small Python script which reaches into our MongoDB database and identifies ‘invalid’ accounts over a handful criteria such as the user having no posts, URLs in the profile of the user and more. And I can quite accurately sort out spam from legit accounts. The problem is when I just delete these documents and their directly related documents (e.g., for
user:100
alsouser:100:emails
,user:100:settings
, …) in the Mongo database, then I end up with an at first glance first glance functional NodeBB instance. But secondary data has not been updated as NodeBB does not seem to be very atomic. The users list on the dummy-forum now has for example countless empty pages, as the users are gone but something has not been updated which feeds that user list. I already rebuilt the forum, but this did not change anything.I also had a look at the WriteAPI. I did not (yet) get the bulk user account deletion to work, but when I use the endpoint
/api/v3/users/{uid}
, my script ends up like this:Processing users: 1%| 320/56329 [11:13<32:23:18, 2.08s/user]
I.e., it takes NodeBB about 2 seconds to delete a single user account. And in total this is then more than a day of processing time. I cannot be the first one with this problem, right? I did not find any solutions to this problem. I also found/nodebb/src/api/users.js:processDeletion
and the lower levelnodebb/src/user/delete.js:User.deleteAccount
, but there is no clear path for me which database documents I have to delete and update.Cheers,
zipitzipit if the accounts have no actual content you can just call
.deleteAccount
as that’s more lightweight.The reason why user deletion takes so long is because of all those cross referenced sets. There are probably opportunities for optimization there.