Sitemap.xml for Google Webmaster

(Sam Saffron) #4

We are open to a PR that adds sitemap support to core provided it is cached for at least an hour and size is culled to something sane.

4 Likes

(Attila Mihaly Balazs) #5

@sam - could you please give the link to the PR? I tried to search GH without luck and it’s a feature I’m interested in too and would like to follow the state of the PR.

Thanks.

1 Like

(Jonathan Sandlund) #6

What service did you use for yours? I’m having a lot of trouble getting my discourse instance picked up by search engines. Hopefully this will help.

0 Likes

(Glenn Drake) #7

I built a simple .NET application that polls the following endpoint and iterates through the page numbers.

http://community.quickfile.co.uk/latest.json?page=0

I then deserialized the response using JSON.NET and pulled out the bits needed to generate the sitemap XML. As my forum is using a sub-domain I could get away with creating the sitemap.xml file in the root of my main application. After submitting the URI for the sitemap to Google Webmaster, within 2 days everything was indexed.

1 Like

(Kevin P. Fleming) #8

If generating a site-wide sitemap.xml is going to be required to get Discourse forum content indexed by search engines, that’s definitely a concern. I suppose it could be automated on the server where the Discourse instance lives, but it’s going to be an incredibly common thing to need.

0 Likes

(Jeff Atwood) #9

You really do not “need” a sitemap.xml file. It is a nice to have in some circumstances, but if the Google (and others) webspiders can’t crawl your forum properly by default, you have much deeper problems.

Answer from someone at Google:

The consensus seems to be that you can get new pages indexed slightly faster with a sitemap.xml file, but that’s about it.

4 Likes

Discourse Sitemap Plugin
Discourse Sitemap Plugin
#10

Hey guys,

in my view, sitemaps are an important piece for indexing any new site.

(recommeded reading: Learn about sitemaps - Search Console Help)

I think that it will help a lot to discover your new topics on a daily basis, and not only by Google, think also in Bing or the next search engine. Also, there is a reason why still you need (and google suggest) to add a sitemap in Webmaster tools, it helps to discover new links.

I’m doing a test right know:

One of my forums is online since early this week, many different new topics are there, and still is not indexed (even considering that I’ve notified this new site in Google Webmaster Tools and Google Analytics).

Now I’ve added a sitemap (using the code provided here).

My guess is that tomorrow morning, it will be indexed, just because I submitted the sitemap.

Will let you know! :smile:

1 Like

(Kane York) #11

In order to do a real experiment, you would need to do a side-by-side comparison of two nearly-identical sites; create (and submit) the sitemap for one and not the other.

0 Likes

(Jeff Atwood) #12

Read this statement very, very closely:

If Google can’t successfully crawl your site to find a link, but is able to find it in the sitemap it gives the sitemap link no weight and will not index it!

So to the extent that sitemap would be masking regular crawl errors, I don’t recommend that. Fix the default Google crawling first using regular HTML web pages and once you’ve done that, you can perhaps get a slight boost in new link indexing speed by using a sitemap.

It’s basically a micro-optimization, but sitemaps are in no way a substitute for making sure Google can properly and completely crawl your site naturally.

6 Likes

#13

Just to update on this:
My experiement did not work. It only indexed 2 URLs, not related to the sitemap that I’ve uploaded.

I don’t know at the moment what is the best way to index the forum, maybe I need to add links on other sites, and wait for organic growth.

P.S.: This is the site that I’m indexing, and it has many inbound links (reddit, some from bing, etc.). http://cryptocurrenciestalk.com

0 Likes

(Glenn Drake) #14

Patience :smile:

It’s very easy to get all OCD on Google Webmaster

5 Likes

(Khoa Nguyen) #15

Hello. I’m new to rails and is there any one done this :).

1 Like

(Emma Fu) #16

The newest version of discourse they do not use SiteSetting.posts_per_page parameter. instead they use TopicView.chunk_size.

Also if you have any chance to run “./launcher rebuild app”. The code will be gone.

2 Likes

(Emma Fu) #17

I have submit sitemap.xml for 6 days… Google only index 40% of pages. And I follow google template format.

http://www.heartemma.com/sitemap.xml

0 Likes

(Khoa Nguyen) #18

That is very normal. You just have to wait.

1 Like

(Pugwash) #19

I originally opened this post (different user). I haven’t submitted a sitemap in over a year and our Discourse community continues to get picked up on Google. I’ve seen new topics appear on Google within 24 hours, so even without a sitemap Discourse is still highly crawlable. Maintaining a current sitemap (at least in my experience) made zero difference.

Also new domains (which appears to be the case with you - no records on web.archive.org) can take months to get any traction on Google. Just keep ploughing in the content and eventually you’ll get indexed, sitemap or no sitemap.

3 Likes

(Anton) #20

How do I make some pages have less weight without a sitemap?
With sitemap, I’d use the <priority> tag.

0 Likes

(Anton) #21

What about the same answer in the Stack Overflow article, where it says this:

Granted, for really small, static, easily crawlable sites, using Sitemaps may be unnecessary from Google’s point of view once the site has been crawled and indexed. For anything else, I’d really recommend using them.

Discourse isn’t a really small, static website, is it?

2 Likes

(Jeff Atwood) #22

It actually is. View it with JavaScript disabled to see what I mean. That’s what is presented to crawlers.

2 Likes

(Michael - DiscourseHosting.com) #23

In this context, static means the content doesn’t change a lot i.e. not many pages are added.
And small means there are not a lot of pages.

So that does not cover Discourse forums although they are easily crawlable indeed.

1 Like