Runaway database-process due to `enable user directory`

I am experiencing 100% CPU usage of postgresql after a fresh installation of discourse (v2.2.0.beta2 +120) and importing ~90k and entries from drupal.

I have traced it to the directory-item query to “create new records for users who don’t have one yet”:

The process is stuck at user 91439 of 91536 total.
Even after 48 hours of 100% CPU there is no change. select count(*) from directory_items; remains at 91439. pg_stat_activity shows the INSERT INTO ...() SELECT ... query in active state and 100% CPU with no end in sight.

Turning off enable user directory and re-starting discourse, works around this issue.

It does smell like a bug, but it could also be so O(N^2) complexity or something specific to the setup here.
Does anyone here have >90k users and a system where this job completes?
Any hints how I could narrow down the issue further before reporting a bug?

Thanks in advance.

I have seen plenty of instances with 100k+ users with a working directory.

After doing your import, did you do any of:

  • backup/restore to cleanup the database
  • vacuum
  • tweak server / postgresql (you probably need more ram than a $5 droplet for a big imported DB)

Hi @Falco . Thanks for your quick response!

There is auto-vaccum and yes, I also did a maual vacuum full analyze. The system has two cores, 8GB RAM and isn’t swapping, in fact only about 2GB RAM are are used (also discourse responds just fine, sidekiq queue is also empty).

Is there an easy way how I can re-start the process and check if it hangs at the same place?
Would you know if it is safe to delete from directory_items ?

During import I did receive around 100 warnings about “invalid email” e.g.

Invalid email for NAME. Using: invalidc18...ec9d24@no-email.invalid

Those users are automatically suspended and silenced.
It could just be where the query is stuck (91439 of 91536), but that may also only be a coincidence.