Plugin causing errors during rebuild

Today I tried to upgrade my Discourse installation, from 2.9.0.beta9 to 2.9.0.beta10. But this went horribly wrong.

  1. My first attempt was on the console
cd /var/discourse
sudo git pull
sudo ./launcher rebuild app

This finally broke off, with the following message somewhere in the middle of all the logs

PG::InvalidParameterValue: ERROR:  cannot extract elements from a scalar
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/pg.rb:110:in `exec'

I was able to simply restore the forum by starting the old image (which was still hanging around).

  1. My second attempt was through the web console (/admin/upgrade). But this finally choked on the same kind of error: ERROR: cannot extract elements from a scalar. When going back to the upgrade panel, it told me all components were updated fine. But after a restart of the container, the server now threw a HTTP 500 error :frowning_face:

  2. I did a clean install on a separate machine, starting from scratch. I was able to install the beta10 version, but trying to restore from a backup caused exactly the same error!

Anybody out there who could be of any help?

I found the query that it is choking on

INSERT INTO question_answer_votes (post_id, user_id, created_at)
	SELECT
	  X.post_id AS post_id,
	  (X.value->>'user_id')::int AS user_id,
	  (X.value->>'created_at')::timestamp AS created_at
	FROM (
	  SELECT
	    post_id,
	    jsonb_array_elements(value::jsonb) AS value
	  FROM post_custom_fields WHERE name = 'vote_history'
	) AS X
	WHERE (X.value->>'action') != 'destroy'
	ORDER BY (X.value->>'created_at')::timestamp DESC
	ON CONFLICT DO NOTHING

If I understand correctly, this happens because the psql function jsonb_array_elements expects an array, but receives a NULL value.

There have been a couple of reports of this:

I think it may be to do with having the Pavilion Question/Answer plugin installed previously?

Iā€™ll see if I can get someone to take a deeper look. :+1:

We are using this plugin: https://github.com/angusmcleod/discourse-question-answer.

  • It seems to be related to the Pavilion Question and Answer plugin.
  • It is a fork from the discourse/discourse-upvotes plugin.
1 Like

Just to check, which plugins do you have in your app.yml?

1 Like

The site that I saw had a bunch of those Post Custom Fields that had a bunch of strings that had been encoded a bunch of times so that they were unusable.

Some solutions, in order of complexity

  • stop using the plugin (and perhaps switch to the new up votes)
  • delete all of them
  • delete the bad ones
  • edit those fields to make the data back into valid json strings.

The site that I saw had bad data that was years old. This appears to be a bug that existed years ago that is only now being discovered.

It might be possible to write code to fix the broken json strings, but I wasnā€™t able to see how to do that in 10 minutes.

See Question Answer Plugin - #301 by pfaffman for examples

2 Likes

I had thought of this too, but a quick check with Angusā€™ fork, it doesnā€™t seem to have any data migrations done. https://github.com/angusmcleod/discourse-question-answer/compare/main...discourse:discourse-upvotes:main

This feels like data contamination from old Q&A posts as you said. I also find it funny that this is popping up now given that that migration is from Nov 2021 ā€“ I suppose this is because JayJay is only switching to the new plugin now? In any case weā€™ll have a look at that migration again.

1 Like
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-solved.git
          - git clone https://github.com/unfoldingWord-dev/discourse-mermaid.git
          - git clone https://github.com/angusmcleod/discourse-question-answer.git
          - git clone https://github.com/discourse/discourse-checklist.git
          - git clone https://github.com/discourse/discourse-cakeday.git
          - git clone https://github.com/discourse/discourse-canned-replies.git
          - git clone https://github.com/discourse/discourse-footnote.git
          - git clone https://github.com/discourse/discourse-staff-notes.git
          - git clone https://github.com/discourse/discourse-graphviz.git
          - git clone https://github.com/discourse/discourse-assign.git
          - git clone https://github.com/discourse/discourse-voting.git
          - git clone https://github.com/discourse/discourse-yearly-review.git
          - git clone https://github.com/discourse/discourse-saved-searches.git

1 Like

@nat Iā€™m not switching to the new plugin at all. Iā€™m still running with the set of plugins that we have had for quite some time.

At least every month, we run our updates, so itā€™s not like there is a huge gap between versions.

2 Likes

Just as a small ā€˜FYIā€™ aside from the question-answer issue, thereā€™s also a replacement to canned-replies now:

And I believe thereā€™s an #official Mermaid theme component now as well:

1 Like

Thank you. Will look into that after my original problem is fixed :smile:

1 Like

If I would remove the plugin, restoring the backup would still cause the offending query to run.
So, I would need to remove both the plugin and any table + query that is used by this plugin.
How would I know the involved tables?

Have you commented out the line for the plugin and attempted a rebuild to confirm that?

2 Likes

Iā€™m going to try that now. Iā€™ve looked through the dump.sql file, that is part of the backup, and there is no mention of any question_answer_* table at all. So this gives me hopeā€¦

1 Like

Iā€™ve had success on my scratch system!

  • Disabled the plugin
  • sudo ./launcher rebuild app
  • Restore the backup

Going to work on my live system. Iā€™ll keep you in the loop.

1 Like

The corrupt data are in the PostCustmField table. But if you donā€™t have the plugin, it wonā€™t try to migrate those data into the new table.

The issue isnā€™t the migration itself, itā€™s that at some point in the past the data got corrupted. If been broken, but it broke in a way that went unnoticed for years. I think that perhaps each new upvote corrupted it further. A reasonable fix might be to ignore the corrupt data, perhaps marking it as corrupt (perhaps renaming the custom field) so that future migrations could ignore it and someone could go and fix them up by hand if they so desired.

2 Likes

Success on live as well. Disabled the plugin, and forth we went.
Still wondering why I did not encounter this issue earlier. As I said, every month as a routine we are updating all our systems, so I should have seen this issue earlier.

How would I be able to connect to the Discourse DB myself, to verify whatā€™s in the post_custom_fields table?

@Jaap-Jan_Swijnenburg Iā€™m sorry youā€™ve experienced an issue.

Iā€™ll respectfully disagree with you there :slight_smile:

The old QnA plugin used standard Discourse custom field type casting, which is meant to handle casting that field to JSON. But that just doesnā€™t work in some cases (like this one), which is why you (ideally) need to build in format checks when youā€™re working with Discourse custom fields (particularly if youā€™re working with quite old code and data like this). I would suggest that that is what the Upvotes Plugin needs to do here, which indeed is where the immediate error is coming from.

You can use the data explorer plugin to check what youā€™ve got there.

Well, I mostly defer to you on this, and I think we really agree. I think itā€™s true that the migration works if the data arenā€™t broken. We agree that if the data are broken it should fail more gracefully that it does now.

Yes, but the data likely were corrupted years ago (thatā€™s the case on the site that I am familiar with), but you didnā€™t notice because it didnā€™t fail catastrophically. Iā€™m pretty sure that it wasnā€™t managing the upvotes as expected, but no one noticed.

Iā€™d do it from rails, something like this:

./launcher enter app
rails c

Then stuff like this:

all_votes=PostCustomField.where(name: "vote_history")
likely_broken_votes=PostCustomField.where(name: "vote_history").where("value like '\"%'")

Look at just the id and data:

all_votes.pluck(:id,:value)

Get just one pcf:

pcf=PostCustomField.find(1234)

Fix it

pcf.value='the stuff you really want in it'
pcf.save

What is happening here is this

  1. Some people have a very old version of the question answer plugin with my personal github username in the repository URL in their app.yml file.

  2. I transferred the QnA plugin to paviliondev years ago. Github redirects repository URLs when theyā€™re transferred, so the old urls with my personal username in it kept working.

  3. More years later Pavilion transferred the Question Answer Plugin to Discourse. Discourse initially kept the name discourse-question-answer.

  4. A few months ago I created my own fork of the plugin hosted in discourse, while it was still called discourse-question-answer.

  5. People with very old links to the QnA plugin with my personal github account in them were now cloning my new fork of the significantly updated discourse-question-answer hosted in discourse. In other words a very old link was now pointing to a fork of a new plugin. Despite the intervening years, I should have forseen that, so apologies for that. Iā€™ve removed that fork.

  6. Discourse changed the name of discourse-question-answer to discourse-upvotes. This change of name has not had a material impact on your case @Jaap-Jan_Swijnenburg, but itā€™s why youā€™re now (unexpectedly) cloning a fork of that repo.

  7. A migration in discourse/discourse-upvotes (ex discourse-question-answer) assumes the value column in post_custom_fields is type-safe as JSON.

  8. Old data created by the plugin when it was angusmcleod/discourse-question-answer (years ago) was not saved as valid JSON by the HasCustomFields concern in discourse/discourse. Iā€™m guessing, but this data was probably added before JSON type-checking was added 4 years ago (Itā€™s still possible to end up with invalid JSON in custom fields registered as JSON in edge cases).

Therefore

  1. When people with the (years-out-of-date) angusmcleod/discourse-question-answer url in their app.yml update their Discourse the migration in the new version of the plugin is cloned, the migration runs and potentially creates this error.

There are a few solutions to this:

  1. @Jaap-Jan_Swijnenburg in your case you just need to remove the reference to my old QnA plugin and youā€™ll be able to rebuild your site. Thatā€™s it; nothing more. Looks like thatā€™s what youā€™ve done :+1:

  2. The discourse/discourse-upvotes plugin migration could be updated to handle non-JSON values in the value column in post_custom_fields.

Iā€™d note that 2 would also handle the additional case of people who do indeed want to switch from the an old version of the QnA plugin to discourse-upvotes. In that case the migration will run and it will fail if any entries in the value column of post_custom_fields are not valid JSON.

5 Likes