Discourse having momentary "downs" - How to get more info from the logs

Thank you Leonardo, I’ve added postfix (default for ubuntu). We’ll see what comes out of it.

I’ve other monitoring in place and to be honest I don’t see any issue with memory or disk space.

Swap stays around 2GB of 8GB available. The VM has 30GB of RAM available. What really is weird to me is how discourse is greedy with it → Discourse Docker HW reserved/used (CPU, RAM, Disk) and how to manage it

I’m not experienced with dmesg but what I can see is a plethora of [UFW BLOCK] messages from several different IPs but of course being so many it’s hard to understand if there is a pattern.

To give you an example:

[Tue May 23 09:32:21 2023] [UFW BLOCK] IN=eth0 OUT= MAC=MAC_ADDRESS_A SRC=IP_ADDRESS_A DST=SERVER_IP LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=34909 DPT=40930 WINDOW=65535 RES=0x00 SYN URGP=0
[Tue May 23 09:32:22 2023] [UFW BLOCK] IN=eth0 OUT= MAC=MAC_ADDRESS_A SRC=IP_ADDRESS_A DST=SERVER_IP LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=TCP SPT=43093 DPT=40942 WINDOW=65535 RES=0x00 SYN URGP=0
[Tue May 23 09:32:29 2023] [UFW BLOCK] IN=eth0 OUT= MAC=MAC_ADDRESS_A SRC=IP_ADDRESS_B DST=SERVER_IP LEN=40 TOS=0x00 PREC=0x00 TTL=249 ID=57687 PROTO=TCP SPT=42801 DPT=3350 WINDOW=1024 RES=0x00 SYN URGP=0
[Tue May 23 09:32:35 2023] [UFW BLOCK] IN=eth0 OUT= MAC=MAC_ADDRESS_A SRC=IP_ADDRESS_C DST=SERVER_IP LEN=40 TOS=0x00 PREC=0x00 TTL=54 ID=61548 PROTO=TCP SPT=21721 DPT=23 WINDOW=43065 RES=0x00 SYN URGP=0
[Tue May 23 09:32:59 2023] [UFW BLOCK] IN=eth0 OUT= MAC=MAC_ADDRESS_A SRC=IP_ADDRESS_D DST=SERVER_IP LEN=44 TOS=0x00 PREC=0x00 TTL=114 ID=0 PROTO=TCP SPT=50293 DPT=1023 WINDOW=29200 RES=0x00 SYN URGP=0

Identifiers are anonymised but if they are the same the have the same reference.

We do use Cloudflare but just as an SSL/domain provider and cache. Unfortunately I’m not in charge of that so before digging further in that direction I’d like to exhaust other possibilities.

I’ve added an uptime check via blackbox exporter that point to the domain to see if there is any downtime detected.