Slow Profile Loads with 100GB+ database

Following up on this post: Slow Page Loads on User Profiles

We’re testing our import to Discourse again now that these changes were pushed, and there is an improvement to profile loading (especially after the initial load when the data is in the cache) but it is still taking 5-10 seconds to load some profiles. Is there anything more than can be done to help here? These look to be the offending queries:

2 Likes

How did you install discourse?

1 Like

Docker container install from the instructions.

1 Like

How much ram?how big is your database?

1 Like

This VM has 8 cores and 32 GB of RAM. I think the database is around 40 GB but I’m not 100% sure at the moment.

1 Like

Does the slow profile page load still happen if

  • you view the user profile page as anonymous (not logged in) user?

  • you view the profile page of a regular user, versus a staff user?

2 Likes

Yes, to both. I’ve tested with a regular user and logged out and the behavior appears roughly the same.

I tried an export and restore (re: Restore Failing - Check Free Disk Space) but that does not appear to have changed the behavior. Also worth noting that I frequently get an error as well when trying to view these profile pages. This appears after several seconds of attempting to load the page.

The database is 104 GB now after the restore.

That topic was mostly about a number of N+1 queries that we had on that route, which are all fixed now.

The profile page does have some heavyweight queries since it does output a very personalized and complete summary of the user, but a reasonable sized database should be able to render it in under 500ms.

That is a big database for a small VM. Are you running everything in the same VM (Web+DB+Redis) ?

Are you running latest PostgreSQL 13? Can you try to run the optional performance tasks described at PostgreSQL 13 update, both the vacuum and the reindex ?

3 Likes

The users with the issue are ones that have lots of posts. Profile pages for users with low post counts load immediately as expected.

Maybe I need to adjust my expectations here. Is 8 cores and 32 GB RAM considered small for Discourse? (Yes, we’re running as a single container install.) On our current software we’re running this board easily with 2 cores and 8 GB RAM.

As for the database, it’s a newly built container so it started on 13.1. Would the vacuum and reindex be necessary right after a restore?

1 Like

The vacuum is definitely recommend for a database that size.

3 Likes

I started these shortly after your post. The vacuum completed very quickly, but the reindex is still going now, over 24 hours after starting it. Is that normal? How long is this expected to take? How can I tell what resource (if any) is constrained? I don’t see postmaster using more than 2-4 cores most of the time, and there appears to be plenty of RAM available.

1 Like

I’m sad to report that the reindex is still running.

@Falco @codinghorror @pfaffman

@Ghan and I have been trying to perfect and make sure importing works for our community over the past year but the question that I guess that is on my mind is thus: Are there any other considerations we should be making in importing a community with over 25 million posts?

Do any communities that large exist on Discourse?

I noticed from looking at the the public list of customers, that as at-least from the ones I can see, no one is close to our size. I can imagine there is only thousands or tens of thousands of more customers I can’t can’t see though. We have successfully completed our import, that works on our end, but we keep bumping into things like the profile load issue with accounts that have large amounts of posts and now this re-indexing issue.

Is there any advise that can be given to us to smooth over this transition?

1 Like

I found the db:stats rake command from another post.
Reindex is still running so I’m not sure if these numbers are 100% final.

3 Likes

Yes, as I said in another topic we have instances with 1GB databases and instances with 500GB databases. But those instances are running in completely different sized VMs.

How is the disk perf on your server? Does it use old spinning hard disks?

2 Likes

We have dual 1.2 TB Intel P3600 NVMe SSDs configured with ZFS mirroring on the host server.

2 Likes

The next thing to check is postgres tuning. You’re using the defaults in the app.yml?

1 Like

Yes. I wasn’t sure if the launcher would automatically set some values based on the resources it sees on the host machine or if there were some recommended guidelines for tuning. Is there a guide somewhere for tuning the database?

There are some comments on the app.yml, though it’s really optimized for smaller self hosted sites. You likely want to increase the settings there (I can’t remember their exact names). You’ll need to look at some more general postgres sources for more information.

It’s not that discourse can’t handle a community of your size, it’s just different from running a smaller one, and a bit harder.

3 Likes

Thanks, will have a look for sure. I’m not familiar with Postgres but hopefully the same principles translate from running MySQL for years.

Definitely. I feel like once we can get some traction on what sorts of changes need to be made then things will be much smoother.

3 Likes