Search Engine / No JavaScript version missing links


(Dean Taylor) #1

When browsing a discourse site (meta.discourse.org) without JavaScript in the same way as it is presented to Googlebot several links are missing.

  1. Links to user profiles
  2. Link to categories list from homepage
  3. Links to users activity from users page
  4. Links to related topics at the bottom of a topic.
  5. Any actual activity on users activity/topics page: Profile - sam - Discourse Meta (yes this looks the same as the users page because it is, no different content.)

EDIT: The idea being if a user would find it hard to navigate to / see related interesting info - so would a search engine.


(Sam Saffron) #2

that is just a random walk, I don’t see this as providing any value unless they are really related.


(Dean Taylor) #3

OK if it is just random - then ignore that point. I was imagining the topics where actually related i.e. StackOverflow style:


(Mittineague) #4

re #1
IMHO the links to member Profiles should NOT be there for search engines to find or follow, at least not until this happens.

There is enough of a problem with fly-by Profile SPAMmers already.


(Sam Saffron) #5

We should include incoming links, cause they are not in the body of posts.


(Jeff Atwood) #6

Irrelevant. Check robots.txt, user profiles are not allowed to be indexed anyway.

This also addresses your points #1, #3, and #5 @DeanMarkTaylor.

Suggested topics is indeed random once you’ve iterated through all the stuff that requires a user login, such as “is this new?” and “am I tracking this?”. So that rules #4 out as well.

I don’t think any of the links mentioned need to appear in the JS-off page.


(Mittineague) #7

Sorry, but you seem to be confusing “allowed” with “suggested” i.e.
http://www.robotstxt.org/robotstxt.html

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don’t try to use /robots.txt to hide information.

So though Google may not ignore the suggestion, other bots will, and robots.txt alone will not be enough to deter Profile SPAMners as indeed it hasn’t yet.


(Jeff Atwood) #8

Google is 94.3% of all incoming traffic at Stack Overflow, for example. That’s not an exaggeration, those are the actual numbers.

These “other bots” are irrelevant.