Rebuild taking ~3hours

Not my issue, but does this mean there is not anything else?

1 Like

Maybe not intentionally, but it might be worth auditing the contents of the host for crypto mining processes, etc…

3 Likes

Step 1: fix the already identified performance problem of using the vfs driver

7 Likes

About this swap to (ideally) overlay2, I will have to erase my current install and re-install everything. This is because the host I am on currently only supports fuse-overlayfs or the vfs which is neither of both recommended.
However they will soon enable KVMs which support overlay2.

So my intention would be to use that, instead of the as well not-suggested fuse-overlayfs

Now, in the Discourse app itself, I can take backups. What does that precisely back up?

Would I lose anything from the current Discourse forum (I mean anything like messages, chats, settings, users, images uploaded, etc) if I took a backup, re-installed a fresh discourse on a fresh server, and then after initial discourse setup, overwrite it with the backup?

Would that work?

Yes, that would work.

The only thing you didn’t mention is to make sure that you have the same plugins on the new Discourse as you have on the current one. If you reuse your app.yml then you would be fine as well.

3 Likes

Right, thanks for pointing that out, I bet I’d have run right into that.

Ok so…

  1. Take backup in the Discourse admin area
  2. Just for safety, of course, take backup of the server
  3. Take copy of the yaml file
  4. Dump the server
  5. Setup new server with supported tech
  6. Install Docker with proper storage driver
  7. Rebuild a full fresh Discourse instance using backed up yaml file
  8. Restore Discourse from backup

A bit surprised the backup is just 19.2 MB.
We have several images etc already uploaded… but I guess all I can do is try.

Will go for that over the weekend and report back if the storage driver change did the trick.

1 Like

Check that this is set:

image

1 Like

Please note that that specific setting only applies to scheduled backups, not to manual ones. With manual backups you always get an explicit choice.

Another setting to enable is include thumbnails in backups

@smileBeda I would postpone #4 until everything is working ok.

3 Likes

It indeed is checked, but Include generated thumbnails in backups. Disabling this will make backups smaller, but requires a rebake of all posts after a restore. was not.

@RGJ … right, good idea, will take more steps as I will have to create a server under a new entity, but it is minor compared to the risk.

I will let the automated backup trigger so I get all data in it as I understand the manual one wouldn’t include the images etc.

Thanks…

That is an incorrect assumption.

When creating a backup manually you get the choice in a popup if you want to backup the database only, or include uploads.

When creating scheduled backups the backup with uploads setting decides that.

4 Likes

Ok, I misunderstood your previous Please note that that specific setting only applies to scheduled backups, not to manual ones.

Thanks…

2 Likes

Hi, I’m reusing this topic since it’s still open and we are having the same problem after migrating to a new virtual server. Just like everyone else says it never took me more than 20 minutes to rebuild our Discourse, but in this new server it takes hours, and it has double the resources than the previous one. :thinking:

I have checked other topics about hour-long upgrades on Meta but I can’t figure out what is the problem with ours:

Server: 4Gb RAM, 2CPU, 50 Gb disk.

Swap:

/$ free
               total        used        free      shared  buff/cache   available
Mem:         3911740      507208     2318476         268     1384032     3404532
Swap:        4095976       45472     4050504

Docker:

/$ sudo docker info
Client:
 Version:    26.1.3
 Context:    default
 Debug Mode: false

Server:
 Containers: 3
  Running: 0
  Paused: 0
  Stopped: 3
 Images: 3
 Server Version: 26.1.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-31-generic
 Operating System: Ubuntu 24.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.731GiB
 Name: podkasts
 ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Looks normal to me but maybe I’m missing something. Where else can we look?

Maybe a noisy neighbor is making your new VM slower than the old one because they are using all of the CPU that you’re not getting?

1 Like

Yes, thank you, it is reassuring that an experienced admin like you aren’t seeing anything obvious in the information above. And yes, we are starting to look at the physical server and our virtual neighborhood. At least the forum works without problems noticeable by users. We are hitting this serious performance problem only with rebuilds. Yesterday it took us 4 hours to rebuild. :face_with_spiral_eyes:

1 Like

If I had this problem, I’d have two or three terminal windows open. One to run the rebuild, one to take notes on the elapsed time and to note where the long delays are happening between log updates, and the final one to keep a record of machine activity: probably running vmstat 5

When you hit a point where the log isn’t updating for a suspiciously long time, take a note of the activity reported by vmstat.

Post suitable extracts from the log with your notes and the corresponding vmstat output here.

It seems highly likely that it’s specific parts of the rebuild which are taking time: the thing to do is to find out which parts, and see what the machine is doing at those times.

I’d probably also take a snapshot of machine activity with ps auxf during the pauses too.

4 Likes

Thank you, this is very good advice. Next time we need to rebuild, we’ll do it this way.

2 Likes