How to recover from a very slow database

(Sam Saffron) #1

Rarely, the database can get into a state where it is so slow it appears broken (I have seen 3 instances of this issue in the last 4 years)

How can you tell you are impacted?

It is quite easy to tell the difference between database is under load, or database is under-provisioned to a broken state.

Likely symptoms:

  • Everything suddenly got extremely slow

  • A rebuild does not fix the problem

  • You see multiple postgres threads consuming 100% CPU when you run top

Definate diagnosis

To ensure 100% you are impacted by this, stop the app and run a search:

./launcher restart app
./launcher enter app
sv stop unicorn
% rails c

On a functioning DB a search will take under a second. On a DB that is gone into :crazy_face: mode expect this to take more than 20 seconds

How to repair

The consistent way we were able to repair this was by dumping the database to disk and restoring:

./launcher enter app
sv stop unicorn
cd /shared
sudo -u postgres pg_dump discourse > db.dump
sudo -u postgres dropdb discourse
sudo -u postgres createdb discourse
sudo -u postgres psql discourse < db.dump

:alarm_clock: :warning: :man_farmer:t3: note

It is very unlikely you are experiencing this issue, we saw it about 3 times in 4 years. However, if you just ran a GIANT import I would strongly recommend following this so you start from a proper clean slate.

Help! Discourse workers dying on our community site
(Jeff Atwood) #2

Did you do a

vacuum full

which requires a full outage? I had to do this on migrated sites in the past to get the database to normal query speeds.

(Sam Saffron) #3

Absolutely, I also tried a restore using our UI, both did not correct the issue.

Restore using the UI still keeps the database around so it does not correct the underlying issue with the db on disk.