Discourse installation has been getting slower and slower and slower

My discourse installation has been getting slower and slower and slower over the last few weeks. In the past when this has happened, rebuilding the app has helped. However, it doesn’t seem to help now.

I’ve looked for advice on this forum, and tried some database optimizations (vacuum full verbose, rebuild index, vacuum analyze verbose)

However, none of it seems to help, and when I start the container, it takes a really really long time before I can actually connect to the forum.

If this keeps up, the forum will eventually become completely unusable. Any idea where to start looking?

3 Likes

How big is your database? How much ram do you have?

1 Like

The output of

vmstat 5 5

could be helpful here. (Run at a time when the problem is happening!)

2 Likes

Available memory: (from top)

iB Mem :  4041756 total,   108980 free,  3834244 used,    98532 buff/cache
KiB Swap:  1949692 total,   972196 free,   977496 used.    31140 avail Mem 

Vmstat output: (while trying to load things, which is working very very slowly)

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 9  2 1011424 108300  15308 122392   37   32   145   101    0    0  2  3 93  1  0
 5  2 1028312 114696   9976 101252 2104 3904  2176  3915  340 1939 41 38  1 19  0
 2  4 1054116 115892  10196 102260 1378 6803  4171  6826  870 1812 23 28  1 48  0
 0  4 1011420 257496  10860 108464 3427 3937  6223  3969  829 2788 15 28  2 55  0
 6  2 1001844 154328  12988 120800 4366  124  7166   161  742 2947 14 26  2 58  0
hubbe@tymin:~$ vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  4 1004748  85768  13948 114648   37   32   145   101    0    0  2  3 93  1  0
 0  6 1033260 108584  10212 101668 1566 6661  4368  6807  497 1990 11 27  0 61  0
12  7 1050808  99524  10708  94852 1932 4551  4854  4625  660 2263 24 32  2 42  0
 5  8 1078776 125060   9136  92948 2079 6963  5541  7030  771 2040 17 32  0 51  0
 4  3 1004784 168216  10164 103420 2631 1457  4970  1467  617 2136 34 38  1 27  0

PS: my site is available here if that helps: https://crucible.hubbe.net/

How do I check?

1 Like

Is Discourse the only thing on that server? How many unicorns you have set on the app.yml file?

2 Likes

It’s not the only thing, but it’s definitely the biggest thing.

Here is top processes by memory usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                       
 1818 hubbe     20   0  910068 159724  10272 S   0.0  4.0   0:31.17 ruby                                                                          
 6141 hubbe     25   5 1195492 140180  10080 S   4.2  3.5  11:31.61 ruby                                                                          
 1845 hubbe     20   0  908732 124036   9388 S   2.8  3.1   0:29.94 ruby                                                                          
 1780 hubbe     20   0  910076  82072   3796 S   0.0  2.0   0:28.40 ruby                                                                          
 1937 systemd+  20   0  360060  25632  21076 S   0.0  0.6   0:00.86 postmaster                                                                    
 2134 systemd+  20   0  356020  23608  19760 S   0.0  0.6   0:00.13 postmaster                                                                    
 1797 systemd+  20   0  355840  22620  19404 S   1.4  0.6   0:00.75 postmaster                                                                    
 2092 systemd+  20   0  356288  21644  17584 S   0.0  0.5   0:00.17 postmaster                                                                    
 2063 systemd+  20   0  355984  18364  16504 S   0.0  0.5   0:00.20 postmaster                                                                    
 1939 systemd+  20   0  355904  15668  15232 S   0.0  0.4   0:00.25 postmaster                                                                    
 2131 systemd+  20   0  353948  14804  13000 S   0.0  0.4   0:00.02 postmaster                                                                    
38770 root      20   0  689760  12940      0 S   0.0  0.3 434:00.34 dockerd                                                                       
43876 root      20   0   16492   9428   1608 S   0.0  0.2   3:34.89 roxen                                                                         
 5728 hubbe     20   0  574796   8136   2064 S   0.0  0.2   0:58.94 ruby                                                                          
37533 root      20   0  593420   5848   1020 S   2.8  0.1 539:40.11 containerd                                                                    
 5689 systemd+  20   0   96432   5832   1672 S   0.0  0.1   3:54.43 redis-server                                                                  
 2196 www-data  20   0  166248   4924   2580 S   0.0  0.1   1:18.03 nginx                                                                         
 2197 www-data  20   0  165972   4484   3168 S   0.0  0.1   1:29.32 nginx                                                                         

Almost everything on this list, except roxen is related to discourse.

UNICORN_WORKERS is commented out in my app.yml

It seems that saving a post is particularly prone to timing out and fail.
Not sure if that is any sort of clue as to what is going on though.

That vmstat output is telling us that - as things are - there’s not enough RAM.

It could be that Discourse is working as it should, and all is as it should be, but that your data has grown to the point that 4G of RAM is not enough.

Or it could be that something has gone wrong and a lot of RAM is in use which should not be in use.

One measure of size is to take a backup-without-attachments and see how big that is.

It might be that the mini-profiler will give a clue as to what database actions are taking so long.

If you have the budget to double the RAM, do that. (If you take care to increase the RAM but leave the storage as-is, if you have such an option from your provider, then this can be a reversible and even a temporary change.)

5 Likes

That’s on the point.

If you can’t afford more RAM, you can try setting lower values for db_shared_buffers (say 128MB or lower) and limiting UNICORN_WORKERS to just 2 in the meanwhile, as you need to stop swapping ASAP.

3 Likes

62.5 MB

More RAM is fairly pricey from my hosting provider, so I will explore other options first. (And I’m worried that changing it will break my grandfathered-in pricing…)

I changed db_shared_buffers to 128Mb and UNICORN_WORKERS to 2.
Is launcher app stop / start enough to make these settings take effect?

What is the mini-profiler, and how do I use it?

1 Like

With an Alt+P on your keyboard, subsequent forum actions should put up a timing string just under the forum banner (for me, on the right) and if you click on the timing, you’ll see a popup window with some stats.

That’s about the same as mine, running with 1G of RAM. I have 2k topics, 15k posts, 500+ signups.

What’s the history of your Discourse? I dimly recall a time in the past when one was supposed to create an index for some table for performance reasons.

What about plugins? Can you run actions in safe mode to see if they run any quicker?

2 Likes

Also could be useful to paste your full process tree:

ps auxf

I posted mine over here
Seeking help for discourse installation

1 Like

Is there an easy place to find such stats?
Most of the stats I see show me what happened in the last day or week, but no totals.

Not sure what you mean by history. But I started it in March 2021.

First impression is that safe mode is not faster. I’ll play around with it and the mini-profiler to see if that impression holds up.

Output of ps auxf attached.
auxf.txt (20.1 KB)

1 Like

On the /about page, there’s a column for all-time.

Looks like your machine hasn’t been rebooted for well over a year - probably worth doing that, and updating for security at the same time. It’s just possible the reboot will help.

1 Like

I think I notice that your server is setup to use hypervisor whereas mine is setup to use lxc. I don’t know if that’s important. (My system shows a process /usr/bin/lxcfs which yours doesn’t, and yours shows a process hv_vss_daemon which mine doesn’t.)

Also, maybe you could share output of df -T and swapon.

1.3k topics, 24.7k posts, 631 signups, 7.1k likes

rebooting linux machines doesn’t usually help anything, but I suppose I can try.

I agree with a skeptical attitude on this! But I don’t suggest it idly - I’m fairly sure we’ve seen a case of a long-running system which did improve with a reboot. (Edit: here, although it was a different case.)

This is what the mini-profiler said for saving a post:

https://fredrik.hubbe.net/miniprofile.html

It took ~30 seconds.