In the main-sitemap the lastmod date for underlying sitemaps is wrong:
E.g. see https://meta.discourse.org/sitemap.xml
The dates for sitemap_2.xml to sitemap_5.xml is all the same ‘2024-03-14T14:02:32Z’ - which is exactly ‘3 days ago’.
I am worried that an optimisation here complicates stuff enormously for very little benefit.
Think it through …
Say there are 6 chunks on meta. If a topic from the last chunk is touched… the entire chunk becomes invalid, you got to remove the topic from there and put it in the front chunk.
Optimising here is a little pointless for a site that sees any kind of activity and the dates inside the chunk on the actual topics are fine.
It’s not about moving topics into different sitemap-chunks. The topics can stay in the same sitemap-chunk where they are already in.
(The mapping topic-to-sitemap-chunk is arbitrary anyway as the db select-statement with limit has no order defined.)
The bug report is about that the lastmod date of each sitemap-chunk should represent the lastmod date of the latest topic which the sitemap-chunk contains.
The way for Google should be:
Load sitemap.xml
→ Check lastmod of sitemap-chunks and queue sitemap-chunks which need an update
(lastmod date is newer than last time downloaded)
Load queued sitemap-chunks sitemap_[1-5].xml
→ Check lastmod of topic-urls and queue topic-urls which need an update
(lastmod date is newer than last time downloaded)
Load queued topic-urls.
If in sitemap.xml the lastmod of the sitemap-chunks is wrong:
→ Google does not queue changed sitemaps-chunks (step 1)
→ Google does not update changed sitemap-chunks in a timely manner (step 2)
→ Google does not update changed topics in a timely manner (step 3)
Again this is not strictly true … last_mod is meant to be the last date the sitemap was modified not max date of topics.
If a topic dropped out of the sitemap section today and last modified in the chunk is a week a go… the chunk changed today. A topic dropped out of it today.
So the very same logic results in:
If a topic in the sitemap section changed today and last modified in the chunk is today… the chunk changed today [note: not 3 days ago]. A topic in it changed today.
For your and my example above the implementation right now says:
sitemap-chunks sitemap_[2-5].xml changed 3 days ago. This is wrong. It should say ‘changed today’.