We have a forum where user_auth_token_logs has 61 million rows (and growing).
There are only 25k user_auth_tokens.
Of the 61 million rows, 54 million rows refer to an user_auth_token that does not exist anymore (i.e. a database integrity issue). And of the 61 million rows, around 58 million are older than 2 months (i.e. seemingly useless?)
Questions:
Could we just clean this up without risking further integrity issues?
Would it be an idea to have a job clean this up automatically?
db=# select count(*) from user_auth_tokens;
count
-------
25648
db=# select count(*) from user_auth_token_logs;
count
----------
61415352
db=# select count(*) from user_auth_token_logs where user_auth_token_id not in (select id from user_auth_tokens);
count
----------
54558442
db=# select count(*) from user_auth_token_logs where created_at < '2024-07-13';
count
----------
58565943
Yes, user_auth_token_logs are there only for debugging purposes. All rows can be emptied, the only consequence will be that you won’t have any logs to debug.
Ah yes, good catch. Looks like lines 214 to 217 need fixing as well.
I would be comfortable with a global cleanup after a certain timeframe. @osama (since you’re the author of the commit linked above), do you think we can clean up all of these logs after some time (and if so, after how long)? It sounds like we need to keep some of them for detecting suspicious logins.
Why does it need fixing? That piece of code is about cleaning up rotated UserAuthTokens, not about the log records?
Update: after enabling SiteSetting.verbose_auth_token_logging, triggering the weekly job and running VACUUM FULL user_auth_token_logs the table went from 16GB to 687MB
Yes, I think we can clean up most of the logs, but some of them have to stay. Specifically, I think any records that have suspicious, generate, or rotate for action will need to be kept around because they’re used for detecting and generating reports for suspicious logins.
The original implementation only logged when SiteSetting.verbose_auth_token_logging is true. Which still had the problem that after disabling it, the most recent remaining logs would stay, but that’s a small thing.
But this change made the logging unconditional (“The generate, rotate and suspicious auth token logs are now always logged regardless of the verbose_auth_token_logging setting”).
TLDR; That change forgot to make the removal unconditional as well.
Sure we will get it sorted over the next few weeks, if there is a rush feel free to shoot through a pr (that is tested and confirm this works as expected)
Indeed, that PR is now merged thanks to @Osama. It addresses most types of user_auth_token_logs, but not all of them, we will follow up with a fix for the generate entries shortly. (See the discussion in the PR link above for more context).
I’m going to keep this topic open while we address the follow up.