?page= bug, both in core and in sitemap plugin

It seems like there are two or three off-by-one-errors stacked on top of each other here…

As soon as a topic has 18 posts, the sitemap_recent.xml starts showing a URL with a page number https://forum.example.com/t/slug/123?page=2

Problem 1: Page 2 does not exist until a topic has 20 posts, but the sitemap does show this link.
Expected: the sitemap does not show page=2 until the topic has 20 posts.

Problem 2: When a topic has 18 posts, that link gives a “That page does not exist” error.
Expected: this is handled gracefully and the user is redirected to the end of the topic.

Problem 3: When a topic has 19 posts, that link gives a “Error. While trying to load. Something went wrong” message.
Expected: this is handled gracefully and the user is redirected to the end of the topic.

Problem 4: When a topic has over 20 posts, but posts have been deleted so the total of visible posts is less, the ?page=2 stays in the recent sitemap until a single new post is made, then it disappears.
Expected: the page number disappears (or for higher pages: is decreased) reflecting the actual amount of available pages.

Worse: This bad link is also showing up in Google!
Problem 5: when that link is clicked, it gives an error to the user.
But in Google, apparently duplicate content is being created.

Repro on Meta:

  1. find a topic with 17 or 18 replies:

  2. find it in https://meta.discourse.org/sitemap_recent.xml
    image

  3. follow the link
    image

  4. check Google

6 Likes

Hmm, that’s not good! We should fix and backport that one @zogstrip .

5 Likes

@nbianca can you add this to your list?

3 Likes

The big problem computing page numbers was in discourse-sitemap:

The problem with caching was fixed here:

I also fixed the not found issue in core (minor), where it would reply with a 200 when requesting second page, but only having 20 posts (1 page):

8 Likes