因数据库损坏导致大量用户账户异常

我遇到了一个关于许多用户的真实问题。在管理面板中,用户显示如下:

但当我尝试查看他们的电子邮件地址时,没有任何内容显示。此外,他们的公开个人资料页面返回 404 错误。

据我观察,所有这些用户几乎都处于不活跃状态(从激活状态来看他们是激活的,但在论坛活动方面是不活跃的)。因此,我认为这可能是由于很久以前自动移除不活跃用户的机制出现错误所导致的。

Another thing I notice:

If I inactivate, and then activate them, they get fixed!

If a user is inactive for a certain length of time (730 days by default) they are automatically inactivated. The setting is in your dashboard under Settings/Users and scroll almost down to the bottom. That is where you’ll see that. But, it’s not necessary to make any changes for the sake of making a change. If those users haven’t logged in for 2 years, it makes no sense to reactivate them unless they start showing up again. :wink:

No. My problem is not this. The dashboard says the users are activated.

Seems similar to the issue I’m having. Could you check if the affected names have duplicates or ‘very near matches’ in your database? For example user.user and useruser.

Yes affected users are among those usernames that are very popular. so discourse suggest a near match username. I signup users with api. so I get a username from user. then I check it against discourse api and if its already taken, I would automatically use whatever discourse suggest.

btw with help of @RGJ we nailed the problem down to this two condisions:

To add to that, the issue @hosna is having is clearly an issue on the database level. It seems like there is some corruption in the users table. Copying the contents to a new table solves these issues.

That said, I did see two occurrences of @bartv’s issue in @hosna’s database (those were the two duplicates before Sept 22 and they both had a dot in their username) but I am not sure if these two issues are related. They just have the same symptoms.

Sounds like a corrupt database index. REINDEX TABLE users should solve the problem.

What about the duplicate usernames? I have many same usernames being used for 2 distinct users.

The issue is acknowledged here:

That’s probably a side effect of a corrupt index. You might need to clean that up manually before the reindexing works.

Can you explain how a corrupt index might happen? to prevent it in the future?

Hardware failure, bug in Postgres,… hard to tell. It happens.

Except that that is impossible since the index is broken.
This does the trick:

# create a temporary table without constraints and copy the contents into it
create table users_test (like users);
insert into users_test select * from users;

# remove duplicate usernames case sensitive, the dupes are after Sept 22
delete from users_test where username in (
  select username 
  from users_test 
  group by username 
  having count(username)>1
) and created_at>'2019-09-22' ;

# remove duplicate usernames case insensitive, the dupes are after Sept 22
delete from users_test where lower(username) in (
  select lower(username) 
  from users_test 
  group by lower(username) 
  having count(lower(username))>1
) and created_at > '2019-09-22' ;

# two more problems left, delete them individually
delete from users_test where id in (184534,130826);

# create a new table with constraints and copy the users
create table users_clean (like users including indexes);
insert into users_clean select * from users_test;

and then rename users to users_old and users_clean to users.

我想在此插一句:这可能会比损坏的用户数据更严重地破坏你的数据库

现在我们卡在升级的中间阶段,因为许多约束条件仍然依赖 users_old(因为我们重命名了表)。这个问题是在应用这个看似不完整的修复几天后才暴露出来的。此外,like users including indexes 并不足够(例如,它会忽略 id 序列)。

您完全正确,我确实记得在重命名表后需要重新创建约束。对于这一重要遗漏,我深表歉意。

根据我的笔记:

alter table poll_votes drop constraint fk_rails_b64de9b025;
alter table poll_votes add constraint fk_rails_b64de9b025 FOREIGN KEY (user_id) REFERENCES users(id);

alter table user_security_keys drop constraint fk_rails_90999b0454;
alter table user_security_keys add  constraint fk_rails_90999b0454 FOREIGN KEY (user_id) REFERENCES users(id);

如今还需要执行:

alter table bookmarks drop constraint fk_rails_c1ff6fa4ac;
alter table bookmarks add  constraint fk_rails_c1ff6fa4ac FOREIGN KEY (user_id) REFERENCES users(id);

重要免责声明:仅在您完全清楚自己在做什么的情况下才使用此方法

这确实与我们在查询 pg_catalog 中影响 users_old 的约束后发现的情况一致。

此外,我记得“包含默认值”是至少必须的操作,否则会导致注册功能失效。

感谢您的修正!