"Random" 502 Errors

I have a Discourse install running on a GCE server. Users reported having problems with the system randomly returning 502 errors. I can replicate the situation by clicking through the Latest, New, Unread, Top, and Categories links. Sooner or later one of them is going to return a 502 error.

I have checked the logs from my proxy server and it is logging entries like this for the failed urls:
“upstream prematurely closed connection while reading response header from upstream”. There are a very large quantity of these errors, for seemingly random URLs.

Here are steps I have taken to try to solve the problem based on posts I have seen:

  • Upgraded the OS
  • Upgraded Docker
  • Upgraded Discourse
  • Rebooted the server

The original install was done using the Docker Cloud Setup guide. I then followed a guide to switch backups and images to use S3.

My server is running:
Ubuntu 14.04.6 LTS (GNU/Linux 4.4.0-148-generic x86_64)

Per discourse-doctor:

     DOCKER VERSION: Docker version 18.06.3-ce, build d7080c1

==================== MEMORY INFORMATION ====================
RAM (MB): 4820

             total       used       free     shared    buffers     cached
Mem:          4707       2206       2501        140        101        948
-/+ buffers/cache:       1156       3550
Swap:         2047          0       2047

==================== DISK SPACE CHECK ====================
---------- OS Disk Space ----------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   33G   15G  70% /
/dev/sda1        50G   33G   15G  70% /var/lib/docker

==================== DISK INFORMATION ====================

Disk /dev/sda: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *       16065   104856254    52420095   83  Linux
Partition 1 does not start on physical sector boundary.

==================== END DISK INFORMATION ====================

I have run top and watched the CPU and Memory numbers and I’m not seeing anything concerning. I’ve looked in the logs and am not seeing anything that points me to the problem.

Any other details I can provide to help troubleshoot the issue? What steps should I take to track this down?

Thank you,

Stephen

1 Like

It could be that Postgres needs a bit more memory. You’ve got plenty, so you might bump db_shared_buffers to 1024MB. You might also bump db_work_mem to 80MB.

Thank you for the suggestion. I made both of those changes in the yml file. Restarting the app didn’t seem to make a difference, so I ended up rebooting the server. Unfortunately I can still replicate the problem.

You need to rebuild or

cd /var/discourse
./launcher destroy app
./launcher start app

for the changes to take effect.

And, this might not be a silver bullet, but I have seen it help.

2 Likes

So far so good, we’ll monitor and see how this helps. Thank you!

2 Likes