Today I tried to upgrade my Discourse installation, from 2.9.0.beta9 to 2.9.0.beta10. But this went horribly wrong.
My first attempt was on the console
sudo git pull
sudo ./launcher rebuild app
This finally broke off, with the following message somewhere in the middle of all the logs
PG::InvalidParameterValue: ERROR: cannot extract elements from a scalar
I was able to simply restore the forum by starting the old image (which was still hanging around).
My second attempt was through the web console (/admin/upgrade). But this finally choked on the same kind of error: ERROR: cannot extract elements from a scalar. When going back to the upgrade panel, it told me all components were updated fine. But after a restart of the container, the server now threw a HTTP 500 error
I did a clean install on a separate machine, starting from scratch. I was able to install the beta10 version, but trying to restore from a backup caused exactly the same error!
INSERT INTO question_answer_votes (post_id, user_id, created_at)
X.post_id AS post_id,
(X.value->>'user_id')::int AS user_id,
(X.value->>'created_at')::timestamp AS created_at
jsonb_array_elements(value::jsonb) AS value
FROM post_custom_fields WHERE name = 'vote_history'
) AS X
WHERE (X.value->>'action') != 'destroy'
ORDER BY (X.value->>'created_at')::timestamp DESC
ON CONFLICT DO NOTHING
If I understand correctly, this happens because the psql function jsonb_array_elements expects an array, but receives a NULL value.
This feels like data contamination from old Q&A posts as you said. I also find it funny that this is popping up now given that that migration is from Nov 2021 – I suppose this is because JayJay is only switching to the new plugin now? In any case we’ll have a look at that migration again.
If I would remove the plugin, restoring the backup would still cause the offending query to run.
So, I would need to remove both the plugin and any table + query that is used by this plugin.
How would I know the involved tables?
The corrupt data are in the PostCustmField table. But if you don’t have the plugin, it won’t try to migrate those data into the new table.
The issue isn’t the migration itself, it’s that at some point in the past the data got corrupted. If been broken, but it broke in a way that went unnoticed for years. I think that perhaps each new upvote corrupted it further. A reasonable fix might be to ignore the corrupt data, perhaps marking it as corrupt (perhaps renaming the custom field) so that future migrations could ignore it and someone could go and fix them up by hand if they so desired.
Success on live as well. Disabled the plugin, and forth we went.
Still wondering why I did not encounter this issue earlier. As I said, every month as a routine we are updating all our systems, so I should have seen this issue earlier.
How would I be able to connect to the Discourse DB myself, to verify what’s in the post_custom_fields table?
The old QnA plugin used standard Discourse custom field type casting, which is meant to handle casting that field to JSON. But that just doesn’t work in some cases (like this one), which is why you (ideally) need to build in format checks when you’re working with Discourse custom fields (particularly if you’re working with quite old code and data like this). I would suggest that that is what the Upvotes Plugin needs to do here, which indeed is where the immediate error is coming from.
You can use the data explorer plugin to check what you’ve got there.
Well, I mostly defer to you on this, and I think we really agree. I think it’s true that the migration works if the data aren’t broken. We agree that if the data are broken it should fail more gracefully that it does now.
Yes, but the data likely were corrupted years ago (that’s the case on the site that I am familiar with), but you didn’t notice because it didn’t fail catastrophically. I’m pretty sure that it wasn’t managing the upvotes as expected, but no one noticed.
I’d do it from rails, something like this:
./launcher enter app
Then stuff like this:
likely_broken_votes=PostCustomField.where(name: "vote_history").where("value like '\"%'")
Look at just the id and data:
Get just one pcf:
pcf.value='the stuff you really want in it'
Some people have a very old version of the question answer plugin with my personal github username in the repository URL in their app.yml file.
I transferred the QnA plugin to paviliondev years ago. Github redirects repository URLs when they’re transferred, so the old urls with my personal username in it kept working.
More years later Pavilion transferred the Question Answer Plugin to Discourse. Discourse initially kept the name discourse-question-answer.
A few months ago I created my own fork of the plugin hosted in discourse, while it was still called discourse-question-answer.
People with very old links to the QnA plugin with my personal github account in them were now cloning my new fork of the significantly updated discourse-question-answer hosted in discourse. In other words a very old link was now pointing to a fork of a new plugin. Despite the intervening years, I should have forseen that, so apologies for that. I’ve removed that fork.
Discourse changed the name of discourse-question-answer to discourse-upvotes. This change of name has not had a material impact on your case @Jaap-Jan_Swijnenburg, but it’s why you’re now (unexpectedly) cloning a fork of that repo.
A migration in discourse/discourse-upvotes (ex discourse-question-answer) assumes the value column in post_custom_fields is type-safe as JSON.
Old data created by the plugin when it was angusmcleod/discourse-question-answer (years ago) was not saved as valid JSON by the HasCustomFields concern in discourse/discourse. I’m guessing, but this data was probably added before JSON type-checking was added 4 years ago (It’s still possible to end up with invalid JSON in custom fields registered as JSON in edge cases).
When people with the (years-out-of-date) angusmcleod/discourse-question-answer url in their app.yml update their Discourse the migration in the new version of the plugin is cloned, the migration runs and potentially creates this error.
There are a few solutions to this:
@Jaap-Jan_Swijnenburg in your case you just need to remove the reference to my old QnA plugin and you’ll be able to rebuild your site. That’s it; nothing more. Looks like that’s what you’ve done
The discourse/discourse-upvotes plugin migration could be updated to handle non-JSON values in the value column in post_custom_fields.
I’d note that 2 would also handle the additional case of people who do indeed want to switch from the an old version of the QnA plugin to discourse-upvotes. In that case the migration will run and it will fail if any entries in the value column of post_custom_fields are not valid JSON.