2.6.0b2 upgrade VERY slow

Wingtip · August 21, 2020, 12:58am

Just a FYI for people-- don’t try this upgrade during prime time.

The 2.6.0b2 upgrade has been running on our server for well over 40 minutes when normally it takes just a couple of minutes, usually over before you check back. I was concerned it was broken, but logging into postgres I can see a huge update running, looks like it’s changing the post search data for private messages.

Hopefully it isn’t broken. I guess I’ll find out. Really don’t want to kill it or restart the container mid-upgrade.

Query running:

postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' 
ORDER BY query_start desc;
  pid  |       age       |  usename  |                                            query                          
                   
-------+-----------------+-----------+---------------------------------------------------------------------------
-------------------
   698 |                 |           | 
   701 |                 | postgres  | 
   699 |                 |           | 
   697 |                 |           | 
   696 |                 |           | 
 14572 | 00:10:31.484201 | discourse | UPDATE post_search_data                                                   
                  +
       |                 |           | SET private_message = X.private_message                                   
                  +
       |                 |           | FROM                                                                      
                  +
       |                 |           | (                                                                         
                  +
       |                 |           |   SELECT post_id,                                                         
                  +
       |                 |           |     CASE WHEN t.archetype = 'private_message' THEN TRUE ELSE FALSE END pri
vate_message      +
       |                 |           |   FROM posts p                                                            
                  +
       |                 |           |   JOIN post_search_data pd ON pd.post_id = p.id                           
                  +
       |                 |           |   JOIN topics t ON t.id = p.topic_id                                      
                  +
       |                 |           |   WHERE pd.private_message IS NULL OR                                     
                  +
       |                 |           |     pd.private_message <> CASE WHEN t.archetype = 'private_message' THEN T
RUE ELSE FALSE END+
       |                 |           |   LIMIT 3000000                                                           
                  +
       |                 |           | ) X                                                                       
                  +
       |                 |           | WHERE X.post_id = post_search_data.post_id                                
                  +
       |                 |           | 
 14573 | 00:47:02.814489 | discourse | SELECT pg_try_advisory_lock(2859260972035668690)
(7 rows)

Wingtip · August 21, 2020, 1:02am

The upgrade just finished successfully as my panic started to hit its peak. Good times, good times.

codinghorror · August 21, 2020, 1:04am

The upgrade was quite slow for me on fast colocated hardware as well. Not entirely sure why, but definitely something to be aware of.

Wingtip · August 21, 2020, 1:09am

Yep, I suggest putting a note in the changelog warning people this one will probably take much longer than most updates, and not to kill it or do anything drastic because that’s expected.

tgxworld · August 21, 2020, 1:52am

@Wingtip Can I check the number of posts that you have on your forum? Unfortunately, this is going to be slow on sites with a large number of posts.

Wingtip · August 21, 2020, 2:08am

Yes, we have over 5 million.

codinghorror · August 21, 2020, 2:18am

My local site didn’t have that many posts and it was still quite slow. Not 40 minutes slow but noticeably slower than previous upgrades by 3x-4x maybe?

neounix · August 21, 2020, 2:54am

FWIW:

Just rebuilt and now running 2.6.0.beta2 ( 2aa1482421 )

The build process was not noticeably slower on our server.

Paracelsus · August 21, 2020, 6:42am

Thanks @Wingtip , I thought it was just happening with us!

Actually, I had to cancel the rebuild and restart the app because I thought it had got stuck on that query you mention. We have 6 million posts and after about 45 minutes was still not completed, so I guess I’ll have to be prepared for at least one hour of rebuild and warn our users before it.

jomaxro · August 21, 2020, 11:16am

A note on the extended Docker Manager and/or rebuild via SSH time has been added to the release notes.

neounix · August 21, 2020, 11:38am

Just pulled out a stop watch and tested a rebuild (site with around 1M posts only) from 2.6.0b1 to b2; and from start to finish, it took 170 seconds.

Screen Shot 2020-08-21 at 6.34.31 PM

This is my second rebuild today, going from b1 to b2 and it seems very fine with no noticeable difference in build speed.

Note: We always upgrade from the command line and do not use the UI.

AntiMetaman · August 21, 2020, 11:44am

I also don’t use the docker manager and prefer to use the rebuild from command line. It’s better this way to see the log in case anything goes wrong. I also think it’s faster.

Wingtip · August 21, 2020, 12:00pm

Yes, seems it’s mostly a problem on forums with a lot of posts.

Wingtip · August 21, 2020, 1:15pm

I (foolishly) upgraded via the web console, so I didn’t have an up to date log the whole time. Last time I make that mistake.

pfaffman · August 21, 2020, 4:20pm

I had a large site that repeatedly failed to bootstrap. It’s a 2-container install, so the old container continued to run while the bootstrap was migrating. I eventually solved the problem by turning on SKIP_POST_DEPLOYMENT_MIGRATIONS=1 for the bootstrap and then running the migrations after the new container got cranked up. With SKIP_POST_DEPLOYMENT_MIGRATIONS=1 in the ENV section, the migration was very fast, the site was then able to function normally (though perhaps more slowly) while the migrations ran, which took > 20 minutes, I think

I think, but haven’t yet tested, that using this same trick would work to minimize down time during a single-container installation as well. If I’m right, what you’d do is

add SKIP_POST_DEPLOYMENT_MIGRATIONS=1 to your app.yml
./launcher rebuild app
./launcher enter app
SKIP_POST_DEPLOYMENT_MIGRATIONS=0 rake db:migrate
un-do the edit in app.yml unless you plan to remember to to the migrations after every upgrade
maybe rebuild again to make sure that you don’t have any problem while you still remember what might have broken your site, as in 4 months when you try again, you’ll have no idea and it’ll be hard for anyone to guess what the problem might be

If there were a way to have ./launcher pass SKIP_POST_DEPLOYMENT_MIGRATIONS=1 into things without requiring an edit to the app.yml it’d make things less fiddly for the editor-challenged.

If I do manage to do the work on this that I think I will, I’ll create a new topic reporting what I’ve found. Though the smoke has me sequestered into a room where I don’t have my big monitor. (The pandemic wasn’t enough?! We have to have smoke too? And I’m not even especially close to the darn fire.)

riking · August 21, 2020, 11:39pm

Good idea, I went and implemented that:

github.com/discourse/discourse_docker

FEATURE: Add easy mechanism for skipping post-deployment migrations

discourse:master ← riking:patch-5

opened 11:38PM - 21 Aug 20 UTC

riking

+8 -0

This will help self-hosters avoid longer downtime when a large post-deployment m…igration is pending. Usage: ``` ./launcher rebuild app --skip-post-migrations ./launcher enter app SKIP_POST_DEPLOYMENT_MIGRATIONS=0 rake db:migrate exit ./launcher restart app ``` (Note: the 'restart' clears the temporarily-set environment variable, so future web upgrades will correctly apply post-deployment migrations.) https://meta.discourse.org/t/2-6-0b2-upgrade-very-slow/161604/19?u=riking

pfaffman · August 21, 2020, 11:42pm

That is fantastic news! (Two other projects got in the way today and somehow my multisite instance is no longer working and playing well with S3. ) Thanks very much.

Wingtip · August 22, 2020, 4:35am

Is there a technical reason why post-upgrade DB changes need to be blocking by default? Is there any way to change that behavior so future upgrades get the site back up quickly and then run the post-upgrade stuff in the background?

IMO anything essential for the upgraded application to function like DDL should really be part of the upgrade itself, not in post-upgrade scripts.

Paracelsus · August 22, 2020, 6:47am

We started our rebuild from the command line seven hours ago! and is still going… It’s still here:

Any ideas?

Edit: meanwhile I’ve killed the process to have the site live again for our users. But there has to be a better way to do this update.

pfaffman · August 22, 2020, 9:04am

Do you have a big database?

Topic		Replies	Views
2.6.0 beta 3 update failed on disk and/or memory space Installation server-resources	29	2414	October 26, 2020
An upgrade knocked my site offline; how long until it's restored? Installation	16	244	August 9, 2025
Upgrade failed spectacularily Support	25	479	February 10, 2025
Problem rebuilding because of slow database shutdown Installation	26	1026	April 25, 2023
Discourse upgrade via Web UI Fails & SSH Upgrade Brings Down Discourse Instance Installation	17	1898	November 26, 2021

2.6.0b2 upgrade VERY slow

Related topics