POSTGRES升级失败 - 我已尝试所有方法

你好,网站突然卡在“Welcome to nginx!”页面,因此进行调试时遇到了以下错误。

Success. You can now start the database server using:

    pg_ctlcluster 10 main start

Warning: The selected stats_temp_directory /var/run/postgresql/10-main.pg_stat_tmp
is not writable for the cluster owner. Not adding this setting in
postgresql.conf.
Ver Cluster Port Status Owner    Data directory              Log file
10  main    5433 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/12/man/man1/postmaster.1.gz because link group postmaster.1.gz is broken
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Processing triggers for postgresql-common (215.pgdg100+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Stopping PostgreSQL 10 database server: main.
Stopping PostgreSQL 12 database server: main.
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

The source cluster was not shut down cleanly.
Failure, exiting
-------------------------------------------------------------------------------------
UPGRADE OF POSTGRES FAILED

Please visit https://meta.discourse.org/t/postgresql-12-update/151236 for support

You can run ./launcher start app to restart your app in the meanwhile




FAILED
--------------------
Pups::ExecError: /root/upgrade_postgres failed with return #<Process::Status: pid 45 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/root/upgrade_postgres"
bfe8265213ad992fa3245d252d192977f05d902d7213b361f53d3bc2b0d16b3a
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
root@bitkcor:/var/discourse# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  648K  394M   1% /run
/dev/vda1        78G   23G   55G  30% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/vda15      105M  3.4M  101M   4% /boot/efi
tmpfs           395M     0  395M   0% /run/user/0
root@bitkcor:/var/discourse# ls
bin   containers        discourse-setup  launcher  README.md  scripts  templates
cids  discourse-doctor  image            LICENSE   samples    shared
root@bitkcor:/var/discourse# cd shared
root@bitkcor:/var/discourse/shared# chown -R postgres postgres_data
chown: invalid user: 'postgres'
root@bitkcor:/var/discourse/shared# ./launcher start app
-bash: ./launcher: No such file or directory
root@bitkcor:/var/discourse/shared# cd ..
root@bitkcor:/var/discourse# ./launcher start app

starting up existing container
+ /usr/bin/docker start app
Error response from daemon: driver failed programming external connectivity on endpoint app (10f3d3b7938496e743c73affc9ddb2d821e04419985bf8e3ebde2ec9ec704a0b): Error starting userland proxy: listen tcp 0.0.0.0:80: bind: address already in use
Error: failed to start containers: app
root@bitkcor:/var/discourse# cd containers
root@bitkcor:/var/discourse/containers# ls
app.yml  app.yml.2018-08-29-034711.bak
root@bitkcor:/var/discourse/containers# cd ..
root@bitkcor:/var/discourse# ./launcher stop app
+ /usr/bin/docker stop -t 10 app
app
root@bitkcor:/var/discourse# ./launcher stop app
+ /usr/bin/docker stop -t 10 app
app
root@bitkcor:/var/discourse# git pull
Already up to date.
root@bitkcor:/var/discourse# ./launcher start app

starting up existing container
+ /usr/bin/docker start app
Error response from daemon: driver failed programming external connectivity on endpoint app (f149fe57c125ae0c276c339a1367c83262f866227ab9317ff06bf026e8776f65): Error starting userland proxy: listen tcp 0.0.0.0:80: bind: address already in use
Error: failed to start containers: app

我已访问 https://meta.discourse.org/t/postgresql-12-update/151236 寻求支持,并尝试了 @Falco 在此处的建议 here,以及 meta 上的其他任何建议,但所有方法均无效。

据我所知,我有足够的空间来完成此操作。

有人能告诉我我漏掉了什么吗?

PostgreSQL 12 更新 的常见问题解答中专门有一条针对此问题的说明。

感谢快速回复!是的,我也试过了。

root@bitkcor:/var/discourse# ./launcher start app

正在启动现有容器
+ /usr/bin/docker start app
来自守护进程的错误响应:端点 app (49f1fdf896618efc824e50f782c1fba91bf81320e49ccadb5e5e80b342552e3e) 的外部连接编程失败,驱动程序出错:启动用户态代理时出错:监听 tcp 0.0.0.0:80:绑定:地址已被占用
错误:启动容器失败:app
root@bitkcor:/var/discourse# ./launcher stop app
+ /usr/bin/docker stop -t 10 app
app
root@bitkcor:/var/discourse# tail -f shared/data/log/var-log/postgres/current
tail: 无法打开 'shared/data/log/var-log/postgres/current' 进行读取:没有那个文件或目录
tail: 没有剩余的文件

不知何故,您的主机上已安装了 nginx。您应从 VPS 中移除任何多余的 Web 服务器,然后按照说明操作。

nginx 已安装,因为该论坛已经正常运行了几年。由于 Digital Ocean 服务延迟,Droplet 被停用了。重新开启后,访问该地址时出现 521 错误。但访问 IP 地址时,又回到了欢迎页面。

Cloudflare 的配置并未更改。

NameCheap 中的域名服务器设置也是正确的。

我已经有好几个月没有做过任何更改,因此不知该从何处排查问题。

所以您安装了 nginx,但它并未运行。关闭 Droplet 再重新启动后,服务被重启,现在它阻止了 Web 端口。

您应该真的卸载它,这样服务器每次重启时就不会再发生这种情况……

有什么变化吗?我之前构建时,Nginx 欢迎页面是安装过程的一部分。我原本以为运行 Discourse 必须安装 Nginx。该如何卸载它?

请将路径更改为
tail -f shared/standalone/log/var-log/postgres/current

指南中提到的路径引用的是双容器安装的数据容器,而您的安装似乎是单容器版本。

完成。显示如下。

root@bitkcor:/var/discourse# tail -f shared/standalone/log/var-log/postgres/current
2020-07-19 03:33:56.864 UTC [19933] discourse@discourse LOG:  duration: 279.207 ms  statement: COPY public.scheduler_stats (id, name, hostname, pid, duration_ms, live_slots_start, live_slots_finish, started_at, success, error) TO stdout;
2020-07-19 03:34:09.436 UTC [19933] discourse@discourse LOG:  duration: 12555.420 ms  statement: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;
2020-07-19 03:34:10.211 UTC [19933] discourse@discourse LOG:  duration: 727.297 ms  statement: COPY public.unsubscribe_keys (key, user_id, created_at, updated_at, unsubscribe_key_type, topic_id, post_id) TO stdout;
2020-07-21 01:56:22.105 UTC [6388] discourse@discourse LOG:  duration: 167.853 ms  execute <unnamed>: INSERT INTO "unsubscribe_keys" ("key", "user_id", "created_at", "updated_at", "unsubscribe_key_type") VALUES ('352fc5679876a1a700dfe7b45f8fa67612592421a3659e08ec5c2ccbf8f0e2d2', 2, '2020-07-21 01:56:21.932109', '2020-07-21 01:56:21.932109', 'digest') RETURNING "key"
2020-07-26 03:34:50.570 UTC [27570] discourse@discourse LOG:  duration: 147.456 ms  statement: COPY public.post_revisions (id, user_id, post_id, modifications, number, created_at, updated_at, hidden) TO stdout;
2020-07-26 03:34:50.925 UTC [27570] discourse@discourse LOG:  duration: 349.648 ms  statement: COPY public.post_search_data (post_id, search_data, raw_data, locale, version) TO stdout;
2020-07-26 03:34:51.236 UTC [27570] discourse@discourse LOG:  duration: 292.799 ms  statement: COPY public.posts (id, user_id, topic_id, post_number, raw, cooked, created_at, updated_at, reply_to_post_number, reply_count, quote_count, deleted_at, off_topic_count, like_count, incoming_link_count, bookmark_count, avg_time, score, reads, post_type, sort_order, last_editor_id, hidden, hidden_reason_id, notify_moderators_count, spam_count, illegal_count, inappropriate_count, last_version_at, user_deleted, reply_to_user_id, percent_rank, notify_user_count, like_score, deleted_by_id, edit_reason, word_count, version, cook_method, wiki, baked_at, baked_version, hidden_at, self_edits, reply_quoted, via_email, raw_email, public_version, action_code, image_url, locked_by_id) TO stdout;
2020-07-26 03:34:51.547 UTC [27570] discourse@discourse LOG:  duration: 296.400 ms  statement: COPY public.scheduler_stats (id, name, hostname, pid, duration_ms, live_slots_start, live_slots_finish, started_at, success, error) TO stdout;
2020-07-26 03:35:04.123 UTC [27570] discourse@discourse LOG:  duration: 12549.364 ms  statement: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;
2020-07-26 03:35:04.760 UTC [27570] discourse@discourse LOG:  duration: 588.788 ms  statement: COPY public.unsubscribe_keys (key, user_id, created_at, updated_at, unsubscribe_key_type, topic_id, post_id) TO stdout;

看起来 PostgreSQL 正在运行。您在检查日志之前是否已关闭 Discourse 容器?

./launcher stop app

我似乎无法停止它。

root@bitkcor:/var/discourse# ./launcher stop app
+ /usr/bin/docker stop -t 10 app
app
root@bitkcor:/var/discourse# tail -f shared/standalone/log/var-log/postgres/current
2020-07-19 03:33:56.864 UTC [19933] discourse@discourse LOG:  duration: 279.207 ms  statement: COPY public.scheduler_stats (id, name, hostname, pid, duration_ms, live_slots_start, live_slots_finish, started_at, success, error) TO stdout;
2020-07-19 03:34:09.436 UTC [19933] discourse@discourse LOG:  duration: 12555.420 ms  statement: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;
2020-07-19 03:34:10.211 UTC [19933] discourse@discourse LOG:  duration: 727.297 ms  statement: COPY public.unsubscribe_keys (key, user_id, created_at, updated_at, unsubscribe_key_type, topic_id, post_id) TO stdout;
2020-07-21 01:56:22.105 UTC [6388] discourse@discourse LOG:  duration: 167.853 ms  execute <unnamed>: INSERT INTO "unsubscribe_keys" ("key", "user_id", "created_at", "updated_at", "unsubscribe_key_type") VALUES ('352fc5679876a1a700dfe7b45f8fa67612592421a3659e08ec5c2ccbf8f0e2d2', 2, '2020-07-21 01:56:21.932109', '2020-07-21 01:56:21.932109', 'digest') RETURNING "key"
2020-07-26 03:34:50.570 UTC [27570] discourse@discourse LOG:  duration: 147.456 ms  statement: COPY public.post_revisions (id, user_id, post_id, modifications, number, created_at, updated_at, hidden) TO stdout;
2020-07-26 03:34:50.925 UTC [27570] discourse@discourse LOG:  duration: 349.648 ms  statement: COPY public.post_search_data (post_id, search_data, raw_data, locale, version) TO stdout;
2020-07-26 03:34:51.236 UTC [27570] discourse@discourse LOG:  duration: 292.799 ms  statement: COPY public.posts (id, user_id, topic_id, post_number, raw, cooked, created_at, updated_at, reply_to_post_number, reply_count, quote_count, deleted_at, off_topic_count, like_count, incoming_link_count, bookmark_count, avg_time, score, reads, post_type, sort_order, last_editor_id, hidden, hidden_reason_id, notify_moderators_count, spam_count, illegal_count, inappropriate_count, last_version_at, user_deleted, reply_to_user_id, percent_rank, notify_user_count, like_score, deleted_by_id, edit_reason, word_count, version, cook_method, wiki, baked_at, baked_version, hidden_at, self_edits, reply_quoted, via_email, raw_email, public_version, action_code, image_url, locked_by_id) TO stdout;
2020-07-26 03:34:51.547 UTC [27570] discourse@discourse LOG:  duration: 296.400 ms  statement: COPY public.scheduler_stats (id, name, hostname, pid, duration_ms, live_slots_start, live_slots_finish, started_at, success, error) TO stdout;
2020-07-26 03:35:04.123 UTC [27570] discourse@discourse LOG:  duration: 12549.364 ms  statement: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;
2020-07-26 03:35:04.760 UTC [27570] discourse@discourse LOG:  duration: 588.788 ms  statement: COPY public.unsubscribe_keys (key, user_id, created_at, updated_at, unsubscribe_key_type, topic_id, post_id) TO stdout;c

nginx 欢迎界面从未是流程的一部分。您记错了。

nginx 同样安装在 Docker 容器内部,此前从未需要在容器外部安装。

您有两个 nginx 实例正在监听同一端口。如果此服务器上仅运行 Discourse,那么可以安全地移除容器外重复的 nginx。

好的,对此感到抱歉。

你知道它可能在哪里吗?

假设服务器运行的是 Ubuntu,请在容器外执行以下命令:

sudo apt-get remove nginx nginx-common

哎呀。现在网站无法访问了。正在恢复备份,只能听天由命了。

这确实是个好消息,这意味着容器外部的 nginx 实例不再劫持该端口。

卸载后,请重启服务器,以确保所有服务正常启动。