Sitemap XML file can not show full topic and posts

The query displays a total of 887,652 records in discourse.

When I view a sitemap XML file in the browser, it displays ten files. Each file contains 50,000 records, but the complete set of records is not visible. You can refer to the previous screenshot for the total count of “887,652” records.

Can you provide guidance on how to include all records in a sitemap XML file?

I have over 800000 topics on my wbsite but sitemap only including upto approx 500000 records. Is there any specific configuration needed?

I’ve slipped your post over to this topic @Ashwani_Kumar as they ask a very similar question and you should be able to benefit from the same answers. :+1:

2 Likes

The XML sitemaps that Discourse is generating isn’t including all posts on my site. I did a query for posts and I see around 800k posts. But the XML sitemaps only include 347k. I’m missing about 55% of post URLs in the sitemap.

I’ve moved your topic over as well @Marc_S as it sounds like a very similar question.

Are the missing topics in private categories?

2 Likes

Also, the query in the OP would pull out all the PMs and deleted topics as well.

I think it would need to be more like:

SELECT 
    COUNT(*)
FROM topics t
  JOIN categories c ON c.id = t.category_id
WHERE c.read_restricted IS FALSE 
  AND t.archetype = 'regular'
  AND t.deleted_at IS NULL
4 Likes

Nice work! I bet deleted posts and PMs explains the missing topics.

1 Like

@JammyDodger
Thanks for providing the above query. So that means that Topic having type “Private Messages” would not included in the sitemap xml
Right?

1 Like

That’s correct, even if they were in the sitemap, Google wouldn’t be able to access them.

2 Likes

Thanks for quick response @RGJ