Discourse SEO overview (sitemap / robots.txt )

Discourse has many SEO features that work straight out of the box. Using our sensible defaults, community managers can focus on cultivating a community and should not feel as distracted by optimizing for search engines. That said, there are some things you can change, some things you should know and some general tips and tricks below.

Here’s a comparison of what a user sees and what a search engine sees:

Topic list:

Topic:

Meta Tags

In Discourse, the generic meta tags essential for SEO are auto-generated based on the content present on the page. The title tag, for instance, is derived from the site or topic title, and the description is generated from the content of the first post. However, customization on a per-page basis for metadata is limited. To alter these values, you need to adjust the settings or the content fields which they are generated from.

  • The Title, Description and Short site description site settings
  • The category names
  • The posts’ titles and content
  • And so on :technologist:

URL Structure and Encoding

Non-Latin characters and URLs

Discourse, by default, strips out non-Latin characters from topic URLs when the locale is set to EN. To avoid this, you can change the locale to the primary non-Latin language or change the slug generation method setting from ASCII to encoded.

Subfolder vs. Subdomain Setup

Discourse leans towards subdomains over subfolders due to its technical simplicity. Google doesn’t really have a preference between the two[1], but Discourse strongly recommends avoiding subfolder setups unless you have deep technical understanding.

Canonicalization

Google is keen on indexing canonical versions of pages. In Discourse, for a topic with multiple replies, the canonical link (the first post) is handed over to Google, which then makes the call on indexing. Topics longer than 20 posts will be paginated, each page being a canonical link containing up to 20 posts.
For example, the canonical tag for the last reply on this topic will be https://meta.discourse.org/t/try-out-the-new-sidebar-and-notification-menus/238821?page=12.

Schema Markup

We use schema.org markup to help in categorizing content in search engines through breadcrumbs. The category name in a topic slug is included:

Sitemap

Discourse incorporates a sitemap located at /sitemap.xml which is enabled by default via the enable sitemap setting. This facilitates better indexing by search engines.

Static view for search engines

Discourse has a static HTML view with no JavaScript to help web crawlers index your site faster. The content between the dynamic and static view is identical and nothing will be omitted or stripped out when the site is crawled by search engines.

Web crawlers potential issues

Web crawlers, also known as robots, are essential for indexing web pages. Some crawlers, however, can be overly enthusiastic, hitting the forum with many requests. Discourse blocks several notorious crawlers by default but allows you to fiddle around with the blocked crawler user agents settings if needed.

robots.txt

You can see and edit, if needed, robots.txt[2]. This file serves as a tool to guide web crawlers on how to interact with the forum content. Its primary purpose is to manage crawler access, preventing them from overwhelming the server, and to help maintain the site’s SEO health by avoiding the indexing of low-value or repetitive pages.

:warning: Modifying this file without careful consideration can harm your site’s indexing.

Migrations and URL Redirections

The permalink feature is used to redirect old URLs, aiming to preserve SEO, preventing “Page Not Found” errors and assist search engines with the right metadataé for easier indexing.
If your community site is migrated to Discourse by our team, the URL redirections are included unless there are valid reasons not to do so.

If you are using one of the existing importer scripts,, you should ensure that the script handles this[3]. You can manually add permalinks from your admin panel, in CustomizePermalinks.

Discourse Page Views and Google Analytics Discrepancy

Discourse and Google Analytics have different methodologies when it comes to counting page views, often leading to a higher page view count in Discourse. This difference stems from Discourse being a single-page application, thereby counting every significant request as a page view.

On the other hand, Google uses Javascript to count page views only on the full page load, excluding web crawlers by default. Unlike Google, Discourse counts raw requests made and issues page views on the first load of a page or when transitioning routes, making their tracking mechanisms distinct from each other.

If you want to learn more about data reports and analytics, have a look to the dedicated category:

De-indexing methods

To get pages out of Google’s index, you can either remove content or block access to a page. Depending on your needs, you can make your whole site private [4]. You can exclude topics by deleting them or putting them in restricted categories. Hidden topics aren’t indexed by default, but they can be if there’s a public link somewhere that redirects to it.

For a lasting removal, using the Removals tool in the Google Search Console is the ticket to keeping pages out of search results.

Learn more at Remove information on your website from Google - Search Console Help.


  1. You can read more about it at Secure Uploads. ↩︎

  2. Look for the “allow index in robots txt” setting. ↩︎

  3. Looking for the permalink string in the import script should give you this info. ↩︎

  4. Look for the login required setting. ↩︎

15 Likes