Will posts show up on Google


(Adriaan Putter) #1

Will Google be able to crawl all the forum posts? Would like to know how it works if pages are loaded just-in-time.


The effect of endless scrolling = Bad for Google / SEO
Ember and SEO challenges regarding discourse.org
Discourse infinite scroll has bad UX
(Matt Rohrer) #2

I would imagine so - testing with curl returns the following:

$ curl http://meta.discourse.org/t/will-posts-show-up-on-google/1184

...
  <body>
    
    <section id='main'>
      <section id='loading-message' style='display:none'><span class="translation_missing" title="translation missing: en.loading">Loading</span>...</section>
      <noscript data-path="/t/will-posts-show-up-on-google/1184">
        <header class="d-header">
          <div class="container">
            <div class="contents">
              <div class="row">
                <div class="title span13">
                  <a href="/"><img src="/assets/logo.png" alt="Discourse" id="site-logo"></a>
                </div>      
              </div>
            </div>
          </div>
        </header>
        <div id="main-outlet" class="container">
          <h2>
  <a href="/t/will-posts-show-up-on-google/1184">Will posts show up on Google</a>
</h2>
  <div class='creator'>
    #1 By: <b>Adriaan Putter</b>, February 6th, 2013 07:14
  </div>
  <div class='post'>
    <p>Will Google be able to crawl all the forum posts? Would like to know how it works if pages are loaded just-in-time.</p>
  </div>




        </div>
      </noscript>
    </section>
...

(Adriaan Putter) #3

Should try that on a topic with a lot more posts that this topic.


(Simon) #4

Well, an even simpler method to test this is to disable JavaScript in your browser. Just did that in Safari (and also disabled CSS) and this is what the main page looks like:

Viewing a topic reveals all posts in a very condensed form, just the content:


(Adriaan Putter) #5

Looking at that, then only the first few posts will get indexed. Is this a good or a bad thing for forums?


(Simon) #6

There are 20 posts per page, and if there are more, a “next page” link is shown (it is the same for the main page). However, just as I tried this for a topic with 33 posts, this seems to be completly broken, as the posts on the page other than the first are completely mixed up. But that’s probably easy to fix.


(Adriaan Putter) #7

Ah looking at the page source I see they are utilizing <noscript> tags. Clever.


(Patrick Burrows) #8

I use NotScripts with Chrome to browse the internet. I pretty much only enable javascript on sites when I have to, and even then I am selective about what domains I enable.

Using NotScripts and Chrome, by default all discourse posts show only a blank grey page. No text, no detection of disabled javascript, no nice degrading.

I should add that I use this setup ALL THE TIME. It is exceedingly rare to have a site (especially one so text-based as this) to show absolutely nothing when you browse to it with Javascript disabled. At the very least you should see a “please enable javascript” notice.


(Jeff Atwood) #9

Good catch, that is indeed fixed by a contributor!


(David Justice) #10

I noticed that the all of the topic pages are rendered with the “Discourse” title tag in the head. The tittle is then changed on the client side (I think because it is rendered with “Discourse” through curl). It appears that google is reading the tittle as “Discourse”, rather than “Will posts show up on Google - Discourse”. I’m not an SEO guru, but webmaster tools is to throwing up an errors regarding duplicated titles.

curl Will posts show up on Google

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Discourse</title>
<meta content="width=device-width, initial-scale=1.0" name="viewport">
<meta content="" name="description">
<meta content="" name="author">

<link href="/t/will-posts-show-up-on-google/1184" rel="canonical" />

<link rel="icon" type="image/png" href=/assets/favicon.ico>
<script src="http://cdn.discourse.org/assets/preload_store-d27819d8d1861c5ef30d8caff3edd643.js" type="text/javascript"></script>

Anyone have any thoughts on this?


(Régis Hanol) #11

This pull request fixes it :wink:


(David Justice) #12

awesome sauce! Thank you.


(Sebastienstettler) #15

https://www.google.co.za/search?rlz=1C1CHMO_enZA508ZA508&oq=discourse+will-posts-show-up-on-google&sugexp=chrome,mod=0&sourceid=chrome&ie=UTF-8&q=discourse+will-posts-show-up-on-google


Google's Cache fails to render topics
(Robert Steele) #16

I would love to see an icon as in ghostbuster around google that can be clicked to make any comment (or the entire thread if set by the topic sponsor) invisible to Google, but by the same token, believe the fastest way to attract others to this cause is to ensure that the default is absolute transparency to indexing by Google and other search engines.


(Sam Saffron) #17

We have Group support that gives you most of that. All secure conversations in a category only visible to a group is not google indexable.

Wanting something to be available to anonymous yet not indexable is a bit odd. What is your use case?


(Katie Hunter) #18

Topic was closed, not sure if we need to discuss it in here The effect of endless scrolling = Bad for Google / SEO

Thanks =) it discusses the topic but it doesn’t mention how Discourse will be optimized for Search Engine, especially Google. For instance, I don’t think Discourse has properly implemented schema tags? Getting Started - schema.org

I see the use of permalink in every post, which is helpful but with infinite scroll and if a topic is having 500 replies in an ongoing conversation (that might be a problem for Google to crawl), i have that on my own forum using VB, even 2k replies. We have to close topics which reaches 2K.

Does Discourse comes with a built in sitemap generator? This might solve some SEO issues if implemented properly. Because a sitemap generator is only useful if implemented right, otherwise it is like making your forum completely naked to Google to see all the flows within it and discover duplicate replies, content, meta tags, meta descriptions etc if there is any.


(Jeff Atwood) #19

This was all already covered.

Do some science: can you search for things in Google that you know are here at meta.discourse.org and find them?

Yes or no? Try it yourself.


(Katie Hunter) #20

I’ve tried but i see few issues:

  1. Google can’t cache the forum topics and take a snapshot of it http://webcache.googleusercontent.com/search?q=cache:6L12a-s4hgMJ:https://meta.discourse.org/t/will-posts-show-up-on-google/1184%3Fpage%3D5000000+&cd=1&hl=en&ct=clnk&gl=us

    Ah ok, nvm i see it here Will posts show up on Google - faq - Discourse Meta

  2. Probably a duplicate content ^ not sure why they would index other page of the same topic. Not sure if it will be the same for 500+ replies per topic and whether it will take so much to load since crawling bots don’t wait until the entire page loads.

But for this page it appears to be indexing properly ^

I’ve tried one of the topics which has more replies 177 replies but i found so many duplicate content i.e., the same topic is getting indexed so many times due to the way the pagination is and cached pages didn’t work probably due to longer load.

https://www.google.com/#newwindow=1&q=What+is+the+most+awesome+plugin+for+Discourse%2C+that+does+not+yet+exist%3F+%2B+discourse


(Jeff Atwood) #21

What’s the problem?

  1. We recently changed some behavior (serving Google a version of the page without unnecessary <script> or JSON data islands), so recently cached pages – stuff Google has indexed recently – will definitely show up.

  2. I see the topic as the first result in your search. You searched for the topic title so this seems correct to me.

It’s the same exact thing as a forum with traditionally paginated topic pages because that is what we serve Google. Again: don’t take my word for it, don’t take anyone’s word for it – just test it yourself, change your browser’s user-agent to the Googlebot user agent and load the pages.

Not clear what the problem is here. I don’t see any problems.


(Katie Hunter) #22

One problem i see is duplicate content which when Google realizes that they are excessive, they will apply penalties and when that happens, your rank and traffic will drop. For a forum base software, for instance i am using a VB and the way to fix this with topic pagination is always redirect guests or crawling bots to the first post of the topic i.e., Direct Links to Thread main first post. That way, that one topic will be indexed and not resulted into duplicate content of the same topic.

In this case below, that is 7 links to the same topic which is bad.

https://www.google.com/#newwindow=1&q=What+is+the+most+awesome+plugin+for+Discourse%2C+that+does+not+yet+exist%3F+%2B+discourse