Will Google be able to crawl all the forum posts? Would like to know how it works if pages are loaded just-in-time.
The effect of endless scrolling = Bad for Google / SEO
Ember and SEO challenges regarding discourse.org
Discourse infinite scroll has bad UX
I would imagine so - testing with curl returns the following:
$ curl http://meta.discourse.org/t/will-posts-show-up-on-google/1184 ... <body> <section id='main'> <section id='loading-message' style='display:none'><span class="translation_missing" title="translation missing: en.loading">Loading</span>...</section> <noscript data-path="/t/will-posts-show-up-on-google/1184"> <header class="d-header"> <div class="container"> <div class="contents"> <div class="row"> <div class="title span13"> <a href="/"><img src="/assets/logo.png" alt="Discourse" id="site-logo"></a> </div> </div> </div> </div> </header> <div id="main-outlet" class="container"> <h2> <a href="/t/will-posts-show-up-on-google/1184">Will posts show up on Google</a> </h2> <div class='creator'> #1 By: <b>Adriaan Putter</b>, February 6th, 2013 07:14 </div> <div class='post'> <p>Will Google be able to crawl all the forum posts? Would like to know how it works if pages are loaded just-in-time.</p> </div> </div> </noscript> </section> ...
Should try that on a topic with a lot more posts that this topic.
Viewing a topic reveals all posts in a very condensed form, just the content:
Looking at that, then only the first few posts will get indexed. Is this a good or a bad thing for forums?
There are 20 posts per page, and if there are more, a “next page” link is shown (it is the same for the main page). However, just as I tried this for a topic with 33 posts, this seems to be completly broken, as the posts on the page other than the first are completely mixed up. But that’s probably easy to fix.
Ah looking at the page source I see they are utilizing
<noscript> tags. Clever.
Good catch, that is indeed fixed by a contributor!
I noticed that the all of the topic pages are rendered with the “Discourse” title tag in the head. The tittle is then changed on the client side (I think because it is rendered with “Discourse” through curl). It appears that google is reading the tittle as “Discourse”, rather than “Will posts show up on Google - Discourse”. I’m not an SEO guru, but webmaster tools is to throwing up an errors regarding duplicated titles.
Anyone have any thoughts on this?
awesome sauce! Thank you.
Google's Cache fails to render topics
I would love to see an icon as in ghostbuster around google that can be clicked to make any comment (or the entire thread if set by the topic sponsor) invisible to Google, but by the same token, believe the fastest way to attract others to this cause is to ensure that the default is absolute transparency to indexing by Google and other search engines.
We have Group support that gives you most of that. All secure conversations in a category only visible to a group is not google indexable.
Wanting something to be available to anonymous yet not indexable is a bit odd. What is your use case?
Topic was closed, not sure if we need to discuss it in here The effect of endless scrolling = Bad for Google / SEO
Thanks =) it discusses the topic but it doesn’t mention how Discourse will be optimized for Search Engine, especially Google. For instance, I don’t think Discourse has properly implemented schema tags? Getting Started - schema.org
I see the use of permalink in every post, which is helpful but with infinite scroll and if a topic is having 500 replies in an ongoing conversation (that might be a problem for Google to crawl), i have that on my own forum using VB, even 2k replies. We have to close topics which reaches 2K.
Does Discourse comes with a built in sitemap generator? This might solve some SEO issues if implemented properly. Because a sitemap generator is only useful if implemented right, otherwise it is like making your forum completely naked to Google to see all the flows within it and discover duplicate replies, content, meta tags, meta descriptions etc if there is any.
This was all already covered.
Do some science: can you search for things in Google that you know are here at meta.discourse.org and find them?
Yes or no? Try it yourself.
I’ve tried but i see few issues:
Google can’t cache the forum topics and take a snapshot of it http://webcache.googleusercontent.com/search?q=cache:6L12a-s4hgMJ:https://meta.discourse.org/t/will-posts-show-up-on-google/1184%3Fpage%3D5000000+&cd=1&hl=en&ct=clnk&gl=us
Ah ok, nvm i see it here Will posts show up on Google - faq - Discourse Meta
Probably a duplicate content ^ not sure why they would index other page of the same topic. Not sure if it will be the same for 500+ replies per topic and whether it will take so much to load since crawling bots don’t wait until the entire page loads.
But for this page it appears to be indexing properly ^
I’ve tried one of the topics which has more replies 177 replies but i found so many duplicate content i.e., the same topic is getting indexed so many times due to the way the pagination is and cached pages didn’t work probably due to longer load.
What’s the problem?
We recently changed some behavior (serving Google a version of the page without unnecessary
<script>or JSON data islands), so recently cached pages – stuff Google has indexed recently – will definitely show up.
I see the topic as the first result in your search. You searched for the topic title so this seems correct to me.
It’s the same exact thing as a forum with traditionally paginated topic pages because that is what we serve Google. Again: don’t take my word for it, don’t take anyone’s word for it – just test it yourself, change your browser’s user-agent to the Googlebot user agent and load the pages.
Not clear what the problem is here. I don’t see any problems.
One problem i see is duplicate content which when Google realizes that they are excessive, they will apply penalties and when that happens, your rank and traffic will drop. For a forum base software, for instance i am using a VB and the way to fix this with topic pagination is always redirect guests or crawling bots to the first post of the topic i.e., Direct Links to Thread main first post. That way, that one topic will be indexed and not resulted into duplicate content of the same topic.
In this case below, that is 7 links to the same topic which is bad.