Thanks. And the count is now 12146?
Count from this query is 12146:
WHERE cooked LIKE '%[quote%'
Sorry, I should have been more clear. I meant this query.
WITH posts_with_bbcode_quotes AS (SELECT posts.id AS targeted_posts
FROM posts
WHERE posts.cooked ~* '\[quote[^]]*]'
ORDER BY posts.id)
SELECT COUNT(DISTINCT(posts.id))
FROM posts_with_bbcode_quotes, posts
WHERE posts.id IN (posts_with_bbcode_quotes.targeted_posts)
This is the one I plan to use to workup the UPDATE query.
That returns a count of 12143
Gadzooks is this a bugger. I hate to proceed without knowing what the problem is so I’ll sleep on it in hopes it comes to me (sounds odd I admit, but there have been countless times that dreams and half-sleep have let me see solutions I couldn’t while in wakefulness)
On a positive note, the UPDATE query is coming along nicely and I should be able to test it with my limited number of posts. Hopefully worse case the query will work and leave only the three enigmatic posts unfixed for you.
Ha! 0.0002470 failure rate. Oh the horror!
I’ve put together this that looks OK with my contrived posts, but I would like you to run it against your posts, changing the LIMIT to something you think would be relatively sufficient.
WITH posts_with_bbcode_quotes AS (SELECT posts.id AS targeted_posts
FROM posts
WHERE posts.cooked ~* '\[quote[^]]*]'
ORDER BY posts.id)
SELECT regexp_replace(posts.raw, '\[(/)?quote([^]]*)]', E'\\n[\\1quote\\2]\\n', 'g')
FROM posts_with_bbcode_quotes, posts
WHERE posts.id IN (posts_with_bbcode_quotes.targeted_posts)
LIMIT 5
For mine, without a LIMIT I get
All on one line Some text \n[quote]\nInline syntax\n[/quote]\n end text. Preview does not look good.
Inline with leading text only Blah blah \n[quote]\nInline syntax\n[/quote]\n Preview Not OK
Inline with trailing text only \n[quote]\nInline syntax\n[/quote]\n and more Preview Not OK
Block starting after text Some text \n[quote]\n Block level \n[/quote]\n preview Not good
Block with text after \n[quote]\n Block level \n[/quote]\n hey now! preview not good
inside text on same line as opening tag \n[quote]\nBlock variant one \n[/quote]\n preview NOT good
inside text on same line as closing tag \n[quote]\n Block variant two\n[/quote]\n preview NOT OK
This is the eighth broken post. Two more bad syntax quotes This is \n[quote]\nthe first\n[/quote]\n And this is \n[quote]\nthe second\n[/quote]\n Ditto on the not.
Ninth post, First broken and with fake attr val, second with fake attr val but not. Broken \n[quote id="666xxx"]\nBlock level \n[/quote]\n I see it \n[quote name="testing"]\n now I \n[/quote]\n don't
Nested quotes, both non-conforming syntax, both with attributes. The outer \n[quote name="outer"]\n To quote the great Gatsby \n[quote name="inner"]\n Give me the money\n[/quote]\n\n[/quote]\n ditto Bad
Nested quotes, outer good, inner bad \n[quote]\n Shouldn't that be \n[quote]\n I have the money\n[/quote]\n instead? \n[/quote]\n Just saying.
With a limit of 50, it seems to work fine:
[quote] —> \n[quote]\n
[/quote] —> \n[/quote]\n
[quote="Peter"] —> \n[quote="Peter"]\n
BTW, I’ve set up a copy of the live instance on a separate DO droplet where I can test and restore backups if things don’t work right. The db size is ~400MB, so it’s rather quick.
I’m certainly glad I tested this. From how I understood the documentation newlines needed to be double escaped. Not so. They needed to be only single escapes else they were written as literal text. I am fairly certain there are better ways to know the number of affected rows, but I do not know how to do them. Anyway, it looks good to me. a couple before and afters:
To recap caveats for those that might be interested in this query
- It does not match against the lack of newlines around improperly formatted bbcode tags in raw post content
- It relies on Discourse displaying the bbcode tags in the cooked content to identify problem posts
- it does not take into account any posts where displaying of bbcode tags in the cooked content is desired (eg. in code fences)
- it matches per post, not per bbcode tag. Therefore if a post contains both improper and proper syntax bbcode tags, all will be surrounded by newlines
- it does not create edit revision histories
- posts require a rebake for the changes to be effected in cooked
- it can not be stressed strongly enough, Save a backup before running this query
- the
RETURNING raw
line will show the number of affected rows after outputting all of the raw content. Though acceptable with my small sample set, this may be undesireable for larger UPDATEs
WITH posts_with_bbcode_quotes AS (SELECT posts.id AS targeted_posts
FROM posts
WHERE posts.cooked ~* '\[quote[^]]*]'
ORDER BY posts.id)
UPDATE posts
SET raw = regexp_replace(posts.raw, '\[(/)?quote([^]]*)]', E'\n[\\1quote\\2]\n', 'g')
FROM posts_with_bbcode_quotes
WHERE posts.id IN (posts_with_bbcode_quotes.targeted_posts)
RETURNING raw;
Run in the pg console, correct?
Regarding rebaking, would matching a pattern work, or is the pattern we’d be looking for already changed?
I did mine post by post using the post admin wrench to Rebuild HTML Obviously doing this for 11 posts is a lot different than doing it for thousands.
AFAIK the command would be
rake posts:rebake
without the match pattern part.
Running commentary:
12143 rows affected.
Next up: rebake… (looks like 2 hr)
While the rebake task is cooking… what’s the proper way to avoid paged output in the pg console? I tried \pset pager off
but still had to go through some pages.
I have no idea. But I would like to know both how to page and how to not page. There have been times I was interested in seeing everything and the output quickly scrolled by and other times I was less interested in seeing all the output but had to page which was a slight annoyance.
Perhaps when done you could run the “12146” WHERE cooked LIKE '%[quote%'
query again and have a look at the three mysterious posts?
Success! However, my computer went to sleep and the session had a broken pipe, so I think the rebake process was shut down at that point.
The WHERE cooked LIKE '%[quote%'
query now returns 450.
I’ll rebake again and provide an update shortly.
Run
tmux
Before you run the command. If you get disconnected, run
tmux attach
to reconnect
Rebaking is complete, but the count remains at 450. Looks like those are unpaired opening or closing bbcode quote tags.
They would have most likely have been unpaired in phpBB when a user deleted one of the tags while trimming the quoted text.
I don’t have an answer for the 3 mystery posts. There were 4 posts that failed rebaking.
I hadn’t thought of unpaired tags. 450 is quite a few, but I would likely slog through them in reverse order, manually fixing each. But then, I can be stubborn determined.
It would be possible to have script deal with them, but script would not be able to judge whether the unpaired tag should be removed, or if its partner should be added, and if so, in what particular place.
In the next step of the cleanup process I want to delete everything in a post below strings like ----Original message----, or Sent from my iPhone, etc. Some of the leftover tags will get removed in that process leaving even fewer for manual editing.
Thanks very much for your dogged effort @Mittineague !
I imported my forum from Mybb. I’ve had the same problem as you. rake posts method have solved most of the problem. But rake posts method can’t delete multiple value expressions like [color=#] or [font=] tags. I want to delete all color tags whatever the value is. How can we adapt this code for that? And where should we run this code?