Not my issue, but does this mean there is not anything else?
Maybe not intentionally, but it might be worth auditing the contents of the host for crypto mining processes, etc…
Step 1: fix the already identified performance problem of using the vfs
driver
About this swap to (ideally) overlay2
, I will have to erase my current install and re-install everything. This is because the host I am on currently only supports fuse-overlayfs
or the vfs
which is neither of both recommended.
However they will soon enable KVMs which support overlay2
.
So my intention would be to use that, instead of the as well not-suggested fuse-overlayfs
Now, in the Discourse app itself, I can take backups. What does that precisely back up?
Would I lose anything from the current Discourse forum (I mean anything like messages, chats, settings, users, images uploaded, etc) if I took a backup, re-installed a fresh discourse on a fresh server, and then after initial discourse setup, overwrite it with the backup?
Would that work?
Yes, that would work.
The only thing you didn’t mention is to make sure that you have the same plugins on the new Discourse as you have on the current one. If you reuse your app.yml
then you would be fine as well.
Right, thanks for pointing that out, I bet I’d have run right into that.
Ok so…
- Take backup in the Discourse admin area
- Just for safety, of course, take backup of the server
- Take copy of the yaml file
- Dump the server
- Setup new server with supported tech
- Install Docker with proper storage driver
- Rebuild a full fresh Discourse instance using backed up yaml file
- Restore Discourse from backup
A bit surprised the backup is just 19.2 MB.
We have several images etc already uploaded… but I guess all I can do is try.
Will go for that over the weekend and report back if the storage driver change did the trick.
Check that this is set:
Please note that that specific setting only applies to scheduled backups, not to manual ones. With manual backups you always get an explicit choice.
Another setting to enable is include thumbnails in backups
@smileBeda I would postpone #4 until everything is working ok.
It indeed is checked, but Include generated thumbnails in backups. Disabling this will make backups smaller, but requires a rebake of all posts after a restore.
was not.
@RGJ … right, good idea, will take more steps as I will have to create a server under a new entity, but it is minor compared to the risk.
I will let the automated backup trigger so I get all data in it as I understand the manual one wouldn’t include the images etc.
Thanks…
That is an incorrect assumption.
When creating a backup manually you get the choice in a popup if you want to backup the database only, or include uploads.
When creating scheduled backups the backup with uploads
setting decides that.
Ok, I misunderstood your previous Please note that that specific setting only applies to scheduled backups, not to manual ones.
Thanks…
Hi, I’m reusing this topic since it’s still open and we are having the same problem after migrating to a new virtual server. Just like everyone else says it never took me more than 20 minutes to rebuild our Discourse, but in this new server it takes hours, and it has double the resources than the previous one.
I have checked other topics about hour-long upgrades on Meta but I can’t figure out what is the problem with ours:
Server: 4Gb RAM, 2CPU, 50 Gb disk.
Swap:
/$ free
total used free shared buff/cache available
Mem: 3911740 507208 2318476 268 1384032 3404532
Swap: 4095976 45472 4050504
Docker:
/$ sudo docker info
Client:
Version: 26.1.3
Context: default
Debug Mode: false
Server:
Containers: 3
Running: 0
Paused: 0
Stopped: 3
Images: 3
Server Version: 26.1.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-31-generic
Operating System: Ubuntu 24.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.731GiB
Name: podkasts
ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Looks normal to me but maybe I’m missing something. Where else can we look?
Maybe a noisy neighbor is making your new VM slower than the old one because they are using all of the CPU that you’re not getting?
Yes, thank you, it is reassuring that an experienced admin like you aren’t seeing anything obvious in the information above. And yes, we are starting to look at the physical server and our virtual neighborhood. At least the forum works without problems noticeable by users. We are hitting this serious performance problem only with rebuilds. Yesterday it took us 4 hours to rebuild.
If I had this problem, I’d have two or three terminal windows open. One to run the rebuild, one to take notes on the elapsed time and to note where the long delays are happening between log updates, and the final one to keep a record of machine activity: probably running vmstat 5
When you hit a point where the log isn’t updating for a suspiciously long time, take a note of the activity reported by vmstat.
Post suitable extracts from the log with your notes and the corresponding vmstat output here.
It seems highly likely that it’s specific parts of the rebuild which are taking time: the thing to do is to find out which parts, and see what the machine is doing at those times.
I’d probably also take a snapshot of machine activity with ps auxf
during the pauses too.
Thank you, this is very good advice. Next time we need to rebuild, we’ll do it this way.