PostgreSQL 13 update

:warning: WARNING! If your database is very large, you will need a lot of extra disk space (2x database size) and should be very careful with this upgrade!

We’ve just landed changes to upgrade our Docker image to PostgreSQL 13. Any site admins rebuilding Discourse from the command line will be upgraded to PostgreSQL 13 from the previous PostgreSQL 12. Note that if you held back from upgrading when the PostgreSQL 12 update happened back in May, you can skip that upgrade and go straight to PostgreSQL 13.

If you had held back the upgrade previously, change the PostgreSQL template in app.yml from templates/postgres.10.template.yml to templates/postgres.template.yml.

As with any upgrade, it is strongly advised to take a backup before doing anything.

Updating

Official Install Guide (single container)

On you next rebuild, you will see this message at the end:

-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES COMPLETE

Old 12 database is stored at /shared/postgres_data_old

To complete the upgrade, rebuild again using:

./launcher rebuild app
-------------------------------------------------------------------------------------

That means everything went well in the upgrade! You just need to issue a new rebuild to get your site back and running.

Data Container Install

If you are running a setup with a dedicated data container based in the sample supplied in our discourse_docker repository, you want to be sure you are shutting down PostgreSQL in a safe and clean way.

Nowadays, we have background jobs running queries spanning several minutes, so shutting down the web container will help the data container be shutdown safely.

./launcher stop web_only
./launcher stop data
./launcher rebuild data
./launcher rebuild data
./launcher rebuild web_only

Before issuing the first rebuild to the data container, you can tail the PostgreSQL log to see if it was shutdown properly.

Running a tail -f shared/data/log/var-log/postgres/current should give you the following log if it was clean:

2020-05-13 18:33:33.457 UTC [36] LOG:  received smart shutdown request
2020-05-13 18:33:33.464 UTC [36] LOG:  worker process: logical replication launcher (PID 52) exited with exit code 1
2020-05-13 18:33:33.465 UTC [47] LOG:  shutting down
2020-05-13 18:33:33.479 UTC [36] LOG:  database system is shut down

Doing a manual update / space constrained environments

:warning::warning::warning:
YOU MUST BACKUP THE POSTGRES_DATA BEFORE TRYING THIS
:warning::warning::warning:

If you are in a constrained space environment without any way to get more space you can try the following:

./launcher stop app #(or both web_only and data if that is your case)
mkdir -p /var/discourse/shared/standalone/postgres_data_new
docker run --rm \
	-v /var/discourse/shared/standalone/postgres_data:/var/lib/postgresql/12/data \
	-v /var/discourse/shared/standalone/postgres_data_new:/var/lib/postgresql/13/data \
	tianon/postgres-upgrade:12-to-13
mv /var/discourse/shared/standalone/postgres_data /var/discourse/shared/standalone/postgres_data_old
mv /var/discourse/shared/standalone/postgres_data_new /var/discourse/shared/standalone/postgres_data
./launcher rebuild app #(or first data and then web_only if that is your case)

On my tests this procedure requires less than 1x your current database size in free space.

Postponing the update

If you need to postpone the update during your next rebuild, you can swap the PostgreSQL template on your app.yml file by changing "templates/postgres.template.yml" to "templates/postgres.12.template.yml".

This is not recommended, as some site admins will forget to revert the change afterwards.

Post update optional tasks

Optimizing PostgreSQL statistics

After the update, the new PostgreSQL won’t have table statistics on hand. You can generate those using:

cd /var/discourse
./launcher enter app
su postgres
psql
\connect discourse
VACUUM VERBOSE ANALYZE;
\q
exit
exit

Or this one-line version of the above:

/var/discourse/launcher run app "echo 'vacuum verbose analyze;' | su postgres -c 'psql discourse'"

Re-creating the indexes

This upgrade main feature is great file savings in our largest table in every instance, the post_timings table and it’s indexes. After doing a successful update you will need to run a command to re-build the indexes and reap the benefits.

cd /var/discourse
./launcher enter app
su postgres
psql
\connect discourse
REINDEX SCHEMA CONCURRENTLY public;
\q
exit
exit

If you can check the post_timings size before and after the REINDEX that would be a cool stat to share here!

You can use the below query to check the 20 largest data objects, run it before the reindex and after:

WITH RECURSIVE pg_inherit(inhrelid, inhparent) AS
    (select inhrelid, inhparent
    FROM pg_inherits
    UNION
    SELECT child.inhrelid, parent.inhparent
    FROM pg_inherit child, pg_inherits parent
    WHERE child.inhparent = parent.inhrelid),
pg_inherit_short AS (SELECT * FROM pg_inherit WHERE inhparent NOT IN (SELECT inhrelid FROM pg_inherit))
SELECT table_schema
    , TABLE_NAME
    , row_estimate
    , pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
    SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes
    FROM (
         SELECT c.oid
              , nspname AS table_schema
              , relname AS TABLE_NAME
              , SUM(c.reltuples) OVER (partition BY parent) AS row_estimate
              , SUM(pg_total_relation_size(c.oid)) OVER (partition BY parent) AS total_bytes
              , SUM(pg_indexes_size(c.oid)) OVER (partition BY parent) AS index_bytes
              , SUM(pg_total_relation_size(reltoastrelid)) OVER (partition BY parent) AS toast_bytes
              , parent
          FROM (
                SELECT pg_class.oid
                    , reltuples
                    , relname
                    , relnamespace
                    , pg_class.reltoastrelid
                    , COALESCE(inhparent, pg_class.oid) parent
                FROM pg_class
                    LEFT JOIN pg_inherit_short ON inhrelid = oid
                WHERE relkind IN ('r', 'p')
             ) c
             LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
  ) a
  WHERE oid = parent
) a
ORDER BY total_bytes DESC LIMIT 20;

Cleaning up old data

For a standard install, you can delete the old data in PG12 format with the following command:

cd /var/discourse
./launcher cleanup

If you have a separate data container, you’ll need to remove the backup copy like this:

rm -fr /var/discourse/shared/data/postgres_data_old/

FAQ

The source cluster was not shut down cleanly

If you get a upgrade failed with the above message, you can try a simpler approach to get it back into a better state.

Restart the old container with ./launcher start app. Wait a few minutes until it is back up.

Now shut it down again with ./launcher stop app. After that tail the logs to see if it was a clean one:

tail -f shared/data/log/var-log/postgres/current
2020-05-13 18:33:33.457 UTC [36] LOG:  received smart shutdown request
2020-05-13 18:33:33.464 UTC [36] LOG:  worker process: logical replication launcher (PID 52) exited with exit code 1
2020-05-13 18:33:33.465 UTC [47] LOG:  shutting down
2020-05-13 18:33:33.479 UTC [36] LOG:  database system is shut down

If the logs look like above, you can now try to upgrade again using ./launcher rebuild app.

lc_collate values for database “postgres” do not match

This errors happens if you are using non-default locales for your database. It was reported that you need 3 variables for it to succeed. Ensure that the env: section of you app.yml file has the 3 lines:

  LC_ALL: en_US.UTF-8
  LANG: en_US.UTF-8
  LANGUAGE: en_US.UTF-8

Changing en_US.UTF-8 to your locale.

Every rebuild does the upgrade again aka upgrade loop

When this happens your upgrade logs will contain

mv: cannot move '/shared/postgres_data' to '/shared/postgres_data_old/postgres_data': Directory not empty
mv: cannot move '/shared/postgres_data_new' to '/shared/postgres_data/postgres_data_new': Directory not empty

This means that there are still files from the last upgrade lingering around. Move those elsewhere before continuing.

Upgrade Complete suggestion scripts - do I need to do anything?

Once the upgrade completes, you’ll see output from the pg_upgrade message saying:

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

You can safely ignore this message.

I did skip the PostgreSQL 12 update, what to do now?

You can follow the standard instructions at the top of this guide and they will upgrade from your version to 13 without issues.

If you are following the space constrained instructions, adapt the version numbers accordingly.

33 Likes

Here are some before and afters:

Small site, topic views index 74 -> 48MB
 table_schema |     table_name     | row_estimate |  total  |  index  |   toast    |  table  
--------------+--------------------+--------------+---------+---------+------------+---------
 public       | stylesheet_cache   |         2123 | 111 MB  | 248 kB  | 109 MB     | 1632 kB
 public       | topic_views        |       666145 | 102 MB  | 74 MB   | 8192 bytes | 28 MB
 public       | fingerprints       |         2322 | 53 MB   | 320 kB  | 52 MB      | 1336 kB
 public       | post_search_data   |         5442 | 45 MB   | 8384 kB | 29 MB      | 8160 kB
 public       | incoming_links     |       313406 | 42 MB   | 23 MB   | 8192 bytes | 19 MB
 public       | scheduler_stats    |       149470 | 29 MB   | 5024 kB | 8192 bytes | 24 MB
 public       | posts              |         5897 | 24 MB   | 3000 kB | 13 MB      | 8096 kB
 public       | post_timings       |        81217 | 9696 kB | 6144 kB |            | 3552 kB
 public       | notifications      |        10397 | 9088 kB | 4064 kB | 8192 bytes | 5016 kB
 public       | user_histories     |         4838 | 8712 kB | 1048 kB | 5408 kB    | 2256 kB
 public       | user_actions       |        15930 | 4512 kB | 3288 kB |            | 1224 kB
 public       | topic_search_data  |         2094 | 4280 kB | 1664 kB | 8192 bytes | 2608 kB
 public       | user_profile_views |        25015 | 4232 kB | 2912 kB | 8192 bytes | 1312 kB
 public       | post_revisions     |         2322 | 4200 kB | 216 kB  | 2496 kB    | 1488 kB
 public       | top_topics         |         1321 | 4072 kB | 3344 kB |            | 728 kB
 public       | topic_users        |        22608 | 3704 kB | 1536 kB |            | 2168 kB
 pg_catalog   | pg_attribute       |        14828 | 3312 kB | 992 kB  |            | 2320 kB
 public       | topic_links        |         6256 | 3136 kB | 1800 kB | 8192 bytes | 1328 kB
 backup       | stylesheet_cache   |           62 | 3072 kB | 32 kB   | 2960 kB    | 80 kB
 public       | user_visits        |        18582 | 2848 kB | 1864 kB |            | 984 kB
(20 rows)


 table_schema |     table_name     | row_estimate |  total  |  index  |   toast    |  table  
--------------+--------------------+--------------+---------+---------+------------+---------
 public       | stylesheet_cache   |         2123 | 111 MB  | 256 kB  | 109 MB     | 1632 kB
 public       | topic_views        |       666145 | 76 MB   | 48 MB   | 8192 bytes | 28 MB
 public       | fingerprints       |         2322 | 53 MB   | 280 kB  | 52 MB      | 1336 kB
 public       | post_search_data   |         5440 | 45 MB   | 3464 kB | 29 MB      | 12 MB
 public       | incoming_links     |       313406 | 37 MB   | 18 MB   | 8192 bytes | 19 MB
 public       | scheduler_stats    |       149500 | 27 MB   | 3304 kB | 8192 bytes | 24 MB
 public       | posts              |         5897 | 23 MB   | 1608 kB | 13 MB      | 8096 kB
 public       | user_histories     |         4838 | 8560 kB | 896 kB  | 5408 kB    | 2256 kB
 public       | post_timings       |        81217 | 7360 kB | 3808 kB |            | 3552 kB
 public       | notifications      |        10397 | 6704 kB | 1680 kB | 8192 bytes | 5016 kB
 public       | post_revisions     |         2322 | 4192 kB | 208 kB  | 2496 kB    | 1488 kB
 public       | user_actions       |        15930 | 3928 kB | 2704 kB |            | 1224 kB
 public       | topic_users        |        22608 | 3704 kB | 1536 kB |            | 2168 kB
 public       | user_profile_views |        25015 | 3488 kB | 2168 kB | 8192 bytes | 1312 kB
 public       | topic_search_data  |         2093 | 3440 kB | 824 kB  | 8192 bytes | 2608 kB
 pg_catalog   | pg_attribute       |        14828 | 3368 kB | 1048 kB |            | 2320 kB
 backup       | stylesheet_cache   |           62 | 3072 kB | 32 kB   | 2960 kB    | 80 kB
 public       | optimized_images   |         6889 | 2720 kB | 656 kB  | 8192 bytes | 2056 kB
 public       | topic_links        |         6342 | 2680 kB | 1344 kB | 8192 bytes | 1328 kB
 public       | user_visits        |        18582 | 2632 kB | 1648 kB |            | 984 kB
(20 rows)
Medium site, post_timings index 505 -> 302 MB
 table_schema |      table_name      | row_estimate | total  |  index  |   toast    |  table  
--------------+----------------------+--------------+--------+---------+------------+---------
 public       | topic_views          | 5.354951e+06 | 815 MB | 587 MB  | 8192 bytes | 228 MB
 public       | post_timings         | 6.918294e+06 | 797 MB | 505 MB  |            | 292 MB
 public       | posts                |       115560 | 470 MB | 42 MB   | 306 MB     | 122 MB
 public       | user_auth_token_logs |       828108 | 246 MB | 36 MB   | 8192 bytes | 210 MB
 public       | stylesheet_cache     |         4974 | 177 MB | 560 kB  | 172 MB     | 4424 kB
 public       | post_search_data     |       107420 | 165 MB | 32 MB   | 47 MB      | 86 MB
 public       | incoming_links       | 1.127407e+06 | 149 MB | 82 MB   | 8192 bytes | 67 MB
 public       | notifications        |       127877 | 107 MB | 44 MB   | 8192 bytes | 63 MB
 public       | topic_users          |       687968 | 107 MB | 44 MB   |            | 62 MB
 public       | user_actions         |       395974 | 106 MB | 77 MB   |            | 29 MB
 public       | plugin_store_rows    |         2923 | 75 MB  | 360 kB  | 74 MB      | 592 kB
 public       | incoming_emails      |         2308 | 67 MB  | 1256 kB | 65 MB      | 1352 kB
 public       | user_profile_views   |       362055 | 58 MB  | 40 MB   | 8192 bytes | 18 MB
 public       | user_visits          |       290872 | 42 MB  | 28 MB   |            | 15 MB
 public       | topic_links          |        69481 | 29 MB  | 17 MB   | 8192 bytes | 12 MB
 public       | post_revisions       |        17611 | 28 MB  | 1224 kB | 13 MB      | 14 MB
 public       | scheduler_stats      |       153067 | 26 MB  | 4520 kB | 8192 bytes | 21 MB
 public       | user_histories       |        25368 | 25 MB  | 4840 kB | 11 MB      | 9424 kB
 public       | topic_link_clicks    |       234945 | 24 MB  | 10 MB   | 8192 bytes | 14 MB
 public       | post_actions         |       105873 | 23 MB  | 15 MB   |            | 7984 kB
(20 rows)

 table_schema |      table_name      | row_estimate | total  |  index  |   toast    |  table  
--------------+----------------------+--------------+--------+---------+------------+---------
 public       | topic_views          | 5.354951e+06 | 608 MB | 380 MB  | 8192 bytes | 228 MB
 public       | post_timings         | 6.918095e+06 | 594 MB | 302 MB  |            | 292 MB
 public       | posts                |       115560 | 455 MB | 26 MB   | 306 MB     | 122 MB
 public       | user_auth_token_logs |       828108 | 233 MB | 23 MB   | 8192 bytes | 210 MB
 public       | stylesheet_cache     |         4974 | 177 MB | 560 kB  | 172 MB     | 4424 kB
 public       | post_search_data     |       107420 | 165 MB | 32 MB   | 47 MB      | 86 MB
 public       | incoming_links       | 1.127407e+06 | 133 MB | 66 MB   | 8192 bytes | 67 MB
 public       | topic_users          |       687968 | 107 MB | 44 MB   |            | 62 MB
 public       | user_actions         |       395974 | 91 MB  | 62 MB   |            | 29 MB
 public       | notifications        |       127877 | 82 MB  | 19 MB   | 8192 bytes | 63 MB
 public       | plugin_store_rows    |         2923 | 75 MB  | 360 kB  | 74 MB      | 592 kB
 public       | incoming_emails      |         2308 | 67 MB  | 552 kB  | 65 MB      | 1352 kB
 public       | user_profile_views   |       362055 | 48 MB  | 29 MB   | 8192 bytes | 18 MB
 public       | user_visits          |       290872 | 38 MB  | 23 MB   |            | 15 MB
 public       | post_revisions       |        17611 | 28 MB  | 1160 kB | 13 MB      | 14 MB
 public       | scheduler_stats      |       153074 | 25 MB  | 3384 kB | 8192 bytes | 21 MB
 public       | user_histories       |        25368 | 25 MB  | 4120 kB | 11 MB      | 9424 kB
 public       | topic_links          |        69511 | 24 MB  | 12 MB   | 8192 bytes | 12 MB
 public       | topic_link_clicks    |       234945 | 21 MB  | 7104 kB | 8192 bytes | 14 MB
 public       | post_actions         |       105873 | 17 MB  | 9584 kB |            | 7984 kB
(20 rows)

Pretty good! Smaller will be faster :slight_smile:

4 Likes

From my site:

Before
 table_schema |      table_name      | row_estimate  |  total  |  index  |   toast    |  table
--------------+----------------------+---------------+---------+---------+------------+---------
 public       | posts                |  1.680365e+06 | 3702 MB | 2343 MB | 55 MB      | 1303 MB
 public       | post_timings         | 1.7430068e+07 | 2661 MB | 1706 MB |            | 954 MB
 public       | post_search_data     |  1.695932e+06 | 2345 MB | 666 MB  | 70 MB      | 1609 MB
 public       | user_actions         |  4.087651e+06 | 1701 MB | 1171 MB |            | 529 MB
 public       | post_actions         |        960423 | 277 MB  | 179 MB  | 8192 bytes | 98 MB
 public       | notifications        |        179686 | 263 MB  | 169 MB  | 8192 bytes | 93 MB
 public       | post_stats           |  1.683217e+06 | 205 MB  | 72 MB   |            | 133 MB
 public       | topic_links          |        286336 | 187 MB  | 124 MB  | 8192 bytes | 63 MB
 public       | post_custom_fields   |        814243 | 175 MB  | 109 MB  | 8192 bytes | 65 MB
 public       | optimized_images     |        160603 | 93 MB   | 36 MB   | 8192 bytes | 57 MB
 public       | uploads              |         61804 | 88 MB   | 62 MB   | 8192 bytes | 26 MB
 public       | user_visits          |         33285 | 70 MB   | 68 MB   |            | 1976 kB
 public       | user_auth_token_logs |        143840 | 70 MB   | 14 MB   | 8192 bytes | 55 MB
 public       | post_replies         |        457524 | 62 MB   | 20 MB   |            | 42 MB
 public       | stylesheet_cache     |           880 | 56 MB   | 160 kB  | 55 MB      | 856 kB
 public       | post_uploads         |        158400 | 35 MB   | 28 MB   |            | 7512 kB
 public       | scheduler_stats      |        112848 | 26 MB   | 8288 kB | 32 kB      | 18 MB
 public       | topic_link_clicks    |        104828 | 11 MB   | 4640 kB | 8192 bytes | 6592 kB
 public       | post_revisions       |          4886 | 8568 kB | 392 kB  | 3448 kB    | 4728 kB
 public       | user_uploads         |         61928 | 8280 kB | 5032 kB |            | 3248 kB
After
 table_schema |      table_name      | row_estimate  |  total  |  index  |   toast    |  table
--------------+----------------------+---------------+---------+---------+------------+---------
 public       | post_search_data     |  1.682766e+06 | 2083 MB | 405 MB  | 69 MB      | 1609 MB
 public       | post_timings         | 1.7434868e+07 | 1740 MB | 786 MB  |            | 954 MB
 public       | posts                |  1.683225e+06 | 1739 MB | 382 MB  | 54 MB      | 1303 MB
 public       | user_actions         |  4.075185e+06 | 1167 MB | 637 MB  |            | 529 MB
 public       | post_stats           |  1.683223e+06 | 205 MB  | 72 MB   |            | 133 MB
 public       | post_actions         |        960429 | 181 MB  | 83 MB   | 8192 bytes | 98 MB
 public       | post_custom_fields   |        814243 | 145 MB  | 80 MB   | 8192 bytes | 65 MB
 public       | notifications        |        179689 | 122 MB  | 29 MB   | 8192 bytes | 93 MB
 public       | topic_links          |        286337 | 119 MB  | 57 MB   | 8192 bytes | 63 MB
 public       | optimized_images     |        160607 | 69 MB   | 12 MB   | 8192 bytes | 57 MB
 public       | post_replies         |        457529 | 62 MB   | 20 MB   |            | 42 MB
 public       | user_auth_token_logs |        143859 | 59 MB   | 4184 kB | 8192 bytes | 55 MB
 public       | stylesheet_cache     |           880 | 56 MB   | 128 kB  | 55 MB      | 856 kB
 public       | uploads              |         61804 | 47 MB   | 21 MB   | 8192 bytes | 26 MB
 public       | scheduler_stats      |        112868 | 20 MB   | 2504 kB | 24 kB      | 18 MB
 public       | post_uploads         |        158401 | 19 MB   | 12 MB   |            | 7512 kB
 public       | topic_link_clicks    |        104829 | 10 MB   | 4056 kB | 8192 bytes | 6592 kB
 public       | post_revisions       |          4886 | 8536 kB | 376 kB  | 3432 kB    | 4728 kB
 public       | user_uploads         |         61928 | 7376 kB | 4128 kB |            | 3248 kB
 public       | quoted_posts         |         48434 | 6192 kB | 3240 kB |            | 2952 kB

So I guess that means I saved roughly a gazillion gigabytes in indexes? Nice.

9 Likes

25 posts were split to a new topic: Issue with running multiple containers after upgrade

This upgrade had a significant impact on our storage - it saved us about 7GB:

DB-Before
 table_schema |     table_name     | row_estimate  |  total  |  index  |   toast    |  table  
--------------+--------------------+---------------+---------+---------+------------+---------
 public       | topic_views        | 7.9218696e+07 | 15 GB   | 11 GB   | 8192 bytes | 4900 MB
 public       | post_search_data   |  4.608203e+06 | 9682 MB | 1944 MB | 1111 MB    | 6627 MB
 public       | posts              |   4.61811e+06 | 8818 MB | 1867 MB | 1404 MB    | 5548 MB
 public       | post_timings       | 5.5985572e+07 | 8014 MB | 4579 MB |            | 3435 MB
 public       | incoming_links     | 3.0499292e+07 | 5480 MB | 2464 MB | 8192 bytes | 3015 MB
 public       | optimized_images   |  3.421182e+06 | 2111 MB | 730 MB  | 8192 bytes | 1381 MB
 public       | user_actions       |  6.035534e+06 | 1900 MB | 1332 MB |            | 568 MB
 public       | topic_users        |  7.887864e+06 | 1687 MB | 563 MB  |            | 1124 MB
 public       | top_topics         |        467326 | 1453 MB | 1282 MB |            | 172 MB
 public       | topics             |  1.281768e+06 | 1414 MB | 644 MB  | 8192 bytes | 770 MB
 public       | topic_links        |   2.65916e+06 | 1071 MB | 608 MB  | 8192 bytes | 464 MB
 public       | notifications      |  1.790599e+06 | 1048 MB | 536 MB  | 8192 bytes | 512 MB
 public       | uploads            |        645079 | 919 MB  | 564 MB  | 8192 bytes | 355 MB
 public       | topic_search_data  |  1.233222e+06 | 897 MB  | 177 MB  | 8192 bytes | 719 MB
 public       | post_custom_fields |  3.813151e+06 | 805 MB  | 447 MB  | 384 kB     | 358 MB
 public       | user_profile_views |  2.045693e+06 | 792 MB  | 262 MB  | 8192 bytes | 530 MB
 public       | topic_link_clicks  |  3.310552e+06 | 636 MB  | 184 MB  | 8192 bytes | 452 MB
 public       | post_stats         |  4.482258e+06 | 521 MB  | 198 MB  |            | 323 MB
 public       | user_visits        |  1.874969e+06 | 513 MB  | 411 MB  |            | 102 MB
 public       | email_logs         |        593291 | 390 MB  | 263 MB  | 8192 bytes | 127 MB
DB-After
 table_schema |      table_name      | row_estimate  |  total   |  index  |   toast    |  table  
--------------+----------------------+---------------+----------+---------+------------+---------
 public       | topic_views          |  8.159804e+07 | 11 GB    | 6326 MB | 8192 bytes | 4901 MB
 public       | post_search_data     |  4.604894e+06 | 10195 MB | 1131 MB | 1110 MB    | 7954 MB
 public       | posts                |  4.629568e+06 | 7966 MB  | 1015 MB | 1403 MB    | 5548 MB
 public       | post_timings         |  5.597004e+07 | 5947 MB  | 2511 MB |            | 3436 MB
 public       | incoming_links       | 3.1206302e+07 | 4844 MB  | 1828 MB | 8192 bytes | 3015 MB
 public       | topic_users          |  7.920156e+06 | 1633 MB  | 509 MB  |            | 1124 MB
 public       | optimized_images     |  3.453272e+06 | 1617 MB  | 236 MB  | 8192 bytes | 1381 MB
 public       | user_actions         |   6.52333e+06 | 1603 MB  | 1036 MB |            | 568 MB
 public       | topic_search_data    |  1.271548e+06 | 1553 MB  | 171 MB  | 8192 bytes | 1382 MB
 public       | topics               |  1.281594e+06 | 1082 MB  | 311 MB  | 8192 bytes | 770 MB
 public       | topic_links          |   2.68317e+06 | 892 MB   | 429 MB  | 8192 bytes | 464 MB
 public       | notifications        |  1.808346e+06 | 757 MB   | 245 MB  | 8192 bytes | 512 MB
 public       | post_custom_fields   |  3.822302e+06 | 709 MB   | 351 MB  | 384 kB     | 358 MB
 public       | user_profile_views   |  2.112168e+06 | 705 MB   | 175 MB  | 8192 bytes | 530 MB
 public       | uploads              |        676809 | 584 MB   | 229 MB  | 8192 bytes | 355 MB
 public       | topic_link_clicks    |  3.683922e+06 | 560 MB   | 108 MB  | 8192 bytes | 452 MB
 public       | post_stats           |  4.629766e+06 | 521 MB   | 198 MB  |            | 323 MB
 public       | user_auth_token_logs |  1.041215e+06 | 316 MB   | 30 MB   | 8192 bytes | 286 MB
 public       | incoming_emails      |          6945 | 307 MB   | 1432 kB | 301 MB     | 5144 kB
 public       | top_topics           |        467378 | 304 MB   | 133 MB  |            | 172 MB

Filesystem - before:

Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        375G  308G   49G  87% /
/dev/sdc        469G  308G  137G  70% /mnt/discourse-data

Filesystem - after:

Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        375G  301G   56G  85% /
/dev/sdc        469G  309G  137G  70% /mnt/discourse-data
7 Likes

Even though I was super-careful and waited two months before doing the upgrade, the space-constrained DB upgrade still failed with the error message below.

I successfully upgraded to 2.7.0 using the pg12 template but can’t get it to upgrade to pg13.

Any ideas? I have no clue where that logfile referenced at the end is located.

root@forum:/var/discourse# docker run --rm \
> -v /var/discourse/shared/standalone/postgres_data:/var/lib/postgresql/12/data \
> -v /var/discourse/shared/standalone/postgres_data_new:/var/lib/postgresql/13/data \
> tianon/postgres-upgrade:12-to-13
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/13/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
ok


Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/13/data -l logfile start

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
  discourse

*failure*

Consult the last few lines of "pg_upgrade_dump_16583.log" for
the probable cause of the failure.
Failure, exiting
1 Like

I think this means that something went wrong and the data directory is empty. I solved this for another site by moving the backup copy back to the data directory and trying again. (Sorry, but that’s as specific as I can be from my phone). There was topic here somewhere on which I was a bit more explicit.

2 Likes

I was able to recover by moving back the PG12 YAML entry and running ./launcher rebuild app, so the DB directory couldn’t have been empty, could it? Happy to follow any instructions if you can find them.

Checking inside the container, the /var/lib/postgresql/13/ directory does not exist. It also doesn’t exist on the host filesystem. The postgres_data_new directory does exist but it’s only 40MB. The postgres_data_new.OLD I tried renaming myself to see if it would fix the problem, it didn’t.

Inside container:

root@forum-app:/var/lib/postgresql# ls -la
total 20
drwxr-xr-x 1 postgres postgres 4096 Feb 20 02:03 .
drwxr-xr-x 1 root     root     4096 Feb 20 02:08 ..
drwxr-xr-x 3 postgres postgres 4096 Feb 20 02:03 12
-rwxr-xr-x 1 postgres postgres  168 Feb 20 02:03 take-database-backup

Host fs:

root@forum:/var/discourse/shared/standalone# ls -la
total 64
drwxr-xr-x 16 root  root     4096 Feb 20 01:57 .
drwxr-xr-x  3 root  root     4096 Jul 16  2018 ..
drwxr-xr-x  3 sam   www-data 4096 Jul 17  2018 backups
drwxr-xr-x  9 root  root     4096 Feb 20 02:07 letsencrypt
drwxr-xr-x  4 root  root     4096 Jul 16  2018 log
drwxr-xr-x  2 lxd   mlocate  4096 Jul 16  2018 postgres_backup
drwx------ 19 lxd   mlocate  4096 Feb 20 02:08 postgres_data
drwx------ 19   999 root     4096 Feb 20 01:57 postgres_data_new
drwx------ 19   999 root     4096 Feb 20 01:43 postgres_data_new.OLD
drwxrwsr-x  5 lxd   mlocate  4096 Feb 20 02:08 postgres_run
drwxr-xr-x  2 uuidd uuidd    4096 Feb 20 16:57 redis_data
drwxr-xr-x  2 root  root     4096 Mar  3  2020 ssl
drwxr-xr-x  2 root  root     4096 Mar  3  2020 ssl-backup
drwxr-xr-x  4 root  root     4096 Jul 16  2018 state
drwxr-xr-x  4 sam   www-data 4096 Feb 20 02:08 tmp
drwxr-xr-x  5 sam   www-data 4096 Oct 27  2018 uploads
root@forum:/var/discourse/shared/standalone# du -h --max-depth=1
4.0K    ./postgres_backup
13M     ./letsencrypt
28K     ./state
8.0K    ./backups
40M     ./postgres_data_new.OLD
220K    ./postgres_run
40M     ./postgres_data_new
59G     ./uploads
105M    ./redis_data
20K     ./ssl
16K     ./tmp
96G     ./postgres_data
32K     ./ssl-backup
1.1G    ./log
156G    
2 Likes

How much free disk space and free virtual memory do you have @Wingtip ?

df -g
free
2 Likes
root@forum:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            7.9G     0  7.9G   0% /dev
tmpfs           1.6G  7.5M  1.6G   1% /run
/dev/sda        315G  170G  129G  57% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
overlay         315G  170G  129G  57% /var/lib/docker/overlay2/fdaa955ab22c1f85f463b6e3cc48e7f87426c4c0d349699d45b76425fe113f86/merged
tmpfs           1.6G     0  1.6G   0% /run/user/1000
root@forum:~# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        2.7G        292M         10G         12G        2.3G
Swap:          255M        162M         93M
2 Likes

You might be able to find this file with something like

find / -name pg_upgrade\*.log

2 Likes

I tried that and no, it isn’t anywhere. My theory is it’s inside the temporary docker container used for the upgrade and thus removed when it finishes running.

4 Likes

I wonder, as you’re at the command line, if you could hit ^Z at the right moment to stop the process after it has logged the problem but before the script around it has deleted the log. Then try the find.

Edit: and then do cont to let the process run to completion, cleanly.

I’m really not comfortable doing that. If it messes up and I need to restore a snapshot, that would take the site down for a much longer time than just a launcher rebuild app.

2 Likes

So the issue is, you’re still stuck with the space limitations? There’s no way to wave a magic wand, give yourself +100gb, then roll it back to a smaller disk later?

Not complaining about this, just trying to understand where you are so we can help.

2 Likes

Hey @Wingtip

Recently, we noticed our used disk space went from 41% to 85%, and that is when I found all the “orphaned” docker images; and when I purged them, all the disk space returned; and that was a lot of reclaimed disk space.

Hence, you may also check all your docker images and make sure you have deleted and purged all unused images and containers.

Check images:

docker images -a

When we attempt to rebuild containers and these tasks fail, this process generally leaves a lot of “wasteful” images which can be hidden from admins unaware of these images.

We purge these images often with:

docker system prune -af

There is also a similar wrapper command with launcher, but I prefer working directly using docker commands. You can use the launcher cleanup command as well, if you wish; but I don’t remember the syntax OTTOMH.

2 Likes

Is this best regarded as a bug in the upgrade script, if it tidies up so thoroughly that the explanatory log file is removed before we have a chance to read it? (Could the script perhaps tail -99 the logfile before removal, if it must remove?)

1 Like

The safest solution might be to migrate to a new server. You’d just do a new install and restore the backup. If anything goes wrong you have zero down time. When you think it works, you make the site read only, restore to the new server and change the dns.

1 Like

Yes that’s right. We can add a separate volume temporarily but not expand the existing volume. And I mean, the manual upgrade worked from 10->12, it just isn’t working from 12->13 for some reason.

@neounix: The prune got us 3.5GB, not enough to make a difference.

@Ed_S: I don’t know if it’s a bug but it would certainly be more robust if it put its logs outside the container so people could look at them.

@pfaffman: Yes that’s the nuclear option, would like to avoid building out a whole new host just to upgrade the database!

4 Likes

Hi @Wingtip, to debug failed upgrades using the space-constrained method you can try the following

docker run \
	-v /var/discourse/shared/standalone/postgres_data:/var/lib/postgresql/12/data \
	-v /var/discourse/shared/standalone/postgres_data_new:/var/lib/postgresql/13/data \
	tianon/postgres-upgrade:12-to-13

note that there is no --rm flag

Once the upgrade fails, the container will be in the ‘stopped’ state. You’ll need to find the container ID using docker ps -a

root@fitzy-testing:~# docker ps -a
CONTAINER ID   IMAGE                              COMMAND                  CREATED          STATUS                      PORTS                                      NAMES
4488419cbe93   tianon/postgres-upgrade:12-to-13   "docker-upgrade pg_u…"   22 seconds ago   Exited (1) 20 seconds ago                                              busy_napier
cbb267224d23   local_discourse/app                "/sbin/boot"             3 weeks ago      Up 3 weeks                  0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app

See here the container name was busy_napier or alternatively 4488419cbe93.

To access the log, commit the failed container to a new image, so we can start the container with a different entrypoint

root@fitzy-testing:~# docker commit busy_napier debug_failed_upgrade
sha256:25f28eff29fece0d75bdd804cadfaeb99db434e9d14f573702e5f36d0dcf7490

Now, you can start a new container with the filesystem intact from the failed upgrade

root@fitzy-testing:~# docker run -it --entrypoint /bin/bash  debug_failed_upgrade
root@7126d8d6d0cf:/var/lib/postgresql# ls
12  13  data  pg_upgrade_internal.log  pg_upgrade_server.log  pg_upgrade_utility.log
root@7126d8d6d0cf:/var/lib/postgresql# cat pg_upgrade_server.log

-----------------------------------------------------------------
  pg_upgrade run on Sun Feb 28 23:27:30 2021
-----------------------------------------------------------------

command: "/usr/lib/postgresql/12/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/postgresql/12/data" -o "-p 50432 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000  -c listen_addresses='' -c unix_socket_permissions=0700" start >> "pg_upgrade_server.log" 2>&1
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-02-28 23:27:30.657 UTC [37] FATAL:  database files are incompatible with server
2021-02-28 23:27:30.657 UTC [37] DETAIL:  The data directory was initialized by PostgreSQL version 13, which is not compatible with this version 12.6 (Debian 12.6-1.pgdg100+1).
 stopped waiting
pg_ctl: could not start server
Examine the log output.

You can see here that my upgrade failed (expectedly) because I was attempting to upgrade an instance that was already on PG13. Remember to remove the stopped container once you’ve debugged!

6 Likes