It seems like there are two or three off-by-one-errors stacked on top of each other here…
As soon as a topic has 18 posts, the sitemap_recent.xml
starts showing a URL with a page number https://forum.example.com/t/slug/123?page=2
Problem 1: Page 2 does not exist until a topic has 20 posts, but the sitemap does show this link.
Expected: the sitemap does not show page=2 until the topic has 20 posts.
Problem 2: When a topic has 18 posts, that link gives a “That page does not exist” error.
Expected: this is handled gracefully and the user is redirected to the end of the topic.
Problem 3: When a topic has 19 posts, that link gives a “Error. While trying to load. Something went wrong” message.
Expected: this is handled gracefully and the user is redirected to the end of the topic.
Problem 4: When a topic has over 20 posts, but posts have been deleted so the total of visible posts is less, the ?page=2
stays in the recent sitemap until a single new post is made, then it disappears.
Expected: the page number disappears (or for higher pages: is decreased) reflecting the actual amount of available pages.
Worse: This bad link is also showing up in Google!
Problem 5: when that link is clicked, it gives an error to the user.
But in Google, apparently duplicate content is being created.
Repro on Meta:
-
find a topic with 17 or 18 replies:
-
find it in
https://meta.discourse.org/sitemap_recent.xml
-
follow the link
-
check Google