Search stopped working, errors in logs


(Allen) #1

I am working on migrating a site from phpBB to Discourse. Search is not working anymore, I am not quite sure since when. I am seeing following symptoms:

  • Search autocomplete is not showing an results
  • Search times out after 30s, showing no results (get 502 server error)

Logs are showing the following:

postgres\current (showing up every second or so with tail)

2018-04-07 20:55:31.169 UTC [2817] FATAL:  "/shared/postgres_data" is not a valid data directory
2018-04-07 20:55:31.169 UTC [2817] DETAIL:  File "/shared/postgres_data/PG_VERSION" does not contain valid data.
2018-04-07 20:55:31.169 UTC [2817] HINT:  You might need to initdb.

rails/unicorn.stderr.log (shows up after searching times out)

E, [2018-04-07T21:03:35.717696 #77] ERROR -- : worker=1 PID:8885 timeout (31s > 30s), killing
I, [2018-04-07T21:03:40.293032 #11715]  INFO -- : worker=1 ready
E, [2018-04-07T21:05:45.855128 #77] ERROR -- : worker=1 PID:11715 timeout (31s > 30s), killing
E, [2018-04-07T21:05:47.884398 #77] ERROR -- : worker=0 PID:6114 timeout (32s > 30s), killing
I, [2018-04-07T21:05:50.699456 #12502]  INFO -- : worker=1 ready
I, [2018-04-07T21:05:53.009950 #12566]  INFO -- : worker=0 ready
D, [2018-04-07T21:06:17.953056 #77] DEBUG -- : waiting 16.0s after suspend/hibernation

Browsing the forum seems to be working normally (except for errors in logs). Since I am not in production yet, I could re-run my migration script. But I would like to know how to resolve this issue in case it occurs in production. Of further note, I resized my DO droplets a few times, checking out various configurations. Often the app stopped working after such a resize, and I had to rebuild. Once Docker even stopped working and I had to restore my base image. Maybe the resizes had something to do with it. Btw, I always adjusted my workers according to the available cpu’s (one worker per cpu).

I documented the following after one of those flukes:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES COMPLETE

Old 9.5 database is stored at /shared/postgres_data_old

To complete the upgrade, rebuild again using:

./launcher rebuild app

Can anyone make any sense of this, and tell me how I can recover from this situation (short of a restore)?


(Jay Pfaffman) #2

Since you’re starting anew, I’d just wipe it all away and start over.

Somehow the migration to Postgres 10 failed. Easiest to just do a clean install, which will start with a fresh PG 10 database.


(Allen) #3

Okay, will do. Just hope this won’t happen too often in production :sweat_smile: Btw, just noticed my admin dashboard is not working anymore either, the other admin tabs work though.


(Jay Pfaffman) #4

It won’t happen often. Perhaps one more rebuild will fix it. Maybe you didn’t notice when it says that the database was updated and you need to rebuild once more.

Postgresql upgrades don’t happen often.


(Allen) #5

I did do the second rebuild, and have rebuilt a few times more since then. In any case, starting over now…