خطأ في التحديث إلى أحدث إصدار من Discourse -- تم الحل

I’ve just received an error with my Discourse installation when attempting to upgrade to 3.5.0.beta3-dev:

Discourse fails to start. Any suggestions are greatly appreciated.

Status: Image is up to date for discourse/base:2.0.20250226-0128

...

I, [2025-04-18T15:16:34.677745 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'LOAD_PLUGINS=0 bundle exec rake plugin:pull_compatible_all'
I, [2025-04-18T15:16:40.775977 #1]  INFO -- : discourse-bbcode-color is already at latest compatible version
discourse-cakeday is already at latest compatible version
discourse-chat-integration is already at latest compatible version
discourse-data-explorer is already at latest compatible version
discourse-math is already at latest compatible version
discourse-oauth2-basic is already at latest compatible version
discourse-rss-polling is already at latest compatible version
discourse-solved is already at latest compatible version
discourse-whos-online is already at latest compatible version
discourse-yearly-review is already at latest compatible version
docker_manager is already at latest compatible version

Error:

I, [2025-04-18T15:16:50.782650 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'
2025-04-18 15:16:59.705 UTC [911] discourse@discourse ERROR:  null value in column "answer_post_id" of relation "discourse_solved_solved_topics" violates not-null constraint
2025-04-18 15:16:59.705 UTC [911] discourse@discourse DETAIL:  Failing row contains (768, 16673, null, -1, null, 2021-05-10 18:32:53.008822, 2021-05-10 18:32:53.008822).
2025-04-18 15:16:59.705 UTC [911] discourse@discourse STATEMENT:  INSERT INTO discourse_solved_solved_topics (
          topic_id,
          answer_post_id,
          topic_timer_id,
          accepter_user_id,
          created_at,
          updated_at
        )
        SELECT
          tc.topic_id,
          tc.answer_post_id,
          tc.topic_timer_id,
          tc.accepter_user_id,
          tc.created_at,
          tc.updated_at
        FROM (
          SELECT
            tc.topic_id,
            CAST(tc.value AS INTEGER) AS answer_post_id,
            CAST(tc2.value AS INTEGER) AS topic_timer_id,
            COALESCE(ua.acting_user_id, -1) AS accepter_user_id,
            tc.created_at,
            tc.updated_at,
            ROW_NUMBER() OVER (PARTITION BY tc.topic_id ORDER BY tc.created_at ASC) AS rn_topic,
            ROW_NUMBER() OVER (PARTITION BY CAST(tc.value AS INTEGER) ORDER BY tc.created_at ASC) AS rn_answer
          FROM topic_custom_fields tc
          LEFT JOIN topic_custom_fields tc2 ON tc2.topic_id = tc.topic_id AND tc2.name = 'solved_auto_close_topic_timer_id'
          LEFT JOIN user_actions ua ON ua.target_topic_id = tc.topic_id AND ua.action_type = 15
          WHERE tc.name = 'accepted_answer_post_id'
            AND tc.id > 0
            AND tc.id <= 0 + 10000
        ) tc
        WHERE tc.rn_topic = 1 AND tc.rn_answer = 1
        ON CONFLICT DO NOTHING

rake aborted!
StandardError: An error has occurred, all later migrations canceled: (StandardError)
إعجاب واحد (1)

I’ve just run discourse-doctor, which fails on the same error.

Additional detail; the failing row:

DETAIL:  Failing row contains (1152, 16673, null, -1, null, 2021-05-10 18:32:53.008822, 2021-05-10 18:32:53.008822)

If I could access the db without the container running, I could delete the row…

Fixed.

It took some DB wrangling. I used the postgres docker image to mount the database:

docker run --name postgres_container_2 -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 -v /var/discourse/shared/standalone/postgres_data:/var/lib/postgresql/data postgres:15

Then located the offending table row:

SELECT
          tc.topic_id,
          tc.answer_post_id,
          tc.topic_timer_id,
          tc.accepter_user_id,
          tc.created_at,
          tc.updated_at
        FROM (
          SELECT
            tc.topic_id,
            CAST(tc.value AS INTEGER) AS answer_post_id,
            CAST(tc2.value AS INTEGER) AS topic_timer_id,
            COALESCE(ua.acting_user_id, -1) AS accepter_user_id,
            tc.created_at,
            tc.updated_at,
            ROW_NUMBER() OVER (PARTITION BY tc.topic_id ORDER BY tc.created_at ASC) AS rn_topic,
            ROW_NUMBER() OVER (PARTITION BY CAST(tc.value AS INTEGER) ORDER BY tc.created_at ASC) AS rn_answer
          FROM topic_custom_fields tc
          LEFT JOIN topic_custom_fields tc2 ON tc2.topic_id = tc.topic_id AND tc2.name = 'solved_auto_close_topic_timer_id'
          LEFT JOIN user_actions ua ON ua.target_topic_id = tc.topic_id AND ua.action_type = 15
          WHERE tc.name = 'accepted_answer_post_id'
            AND tc.id > 0
            AND tc.id <= 0 + 10000
            AND tc.value IS NULL
        ) tc

Once I’d ensured this selected just the row in the error, I changed the select to a DELETE, crossed my fingers and ./launcher rebuild app.

After much time waiting for the image to build, I now have a happy community and a rebuilt site, security fixes included.

I don’t know what caused the row mentioned above to become invalid, but once it was removed, all is well.

4 إعجابات

These are dangerous instructions for accessing the database when discourse is not running.

Use at your own risk.

إعجابَين (2)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.