When I view a sitemap XML file in the browser, it displays ten files. Each file contains 50,000 records, but the complete set of records is not visible. You can refer to the previous screenshot for the total count of “887,652” records.
The XML sitemaps that Discourse is generating isn’t including all posts on my site. I did a query for posts and I see around 800k posts. But the XML sitemaps only include 347k. I’m missing about 55% of post URLs in the sitemap.
Also, the query in the OP would pull out all the PMs and deleted topics as well.
I think it would need to be more like:
SELECT
COUNT(*)
FROM topics t
JOIN categories c ON c.id = t.category_id
WHERE c.read_restricted IS FALSE
AND t.archetype = 'regular'
AND t.deleted_at IS NULL
@JammyDodger
Thanks for providing the above query. So that means that Topic having type “Private Messages” would not included in the sitemap xml
Right?