Discussion Forum Schema improvements

Hello. We are currently receiving this message in our Google Search Console. I’m not entirely sure what it signifies. Could I get more clarity on this issue? Is there a solution? Additionally, I would like to mention that we have tried using multiple themes for the platform, but the same error persists.

1 Like

Hello, hiccup!

Structured Data helps provide search engines with more context, essentially.

Google Search doesn’t find an optional url field in that topic.
You can see on validator.schema.org that it’s perfectly valid without any warnings.

There is nothing to worry about.
That said, if Google Search highlights this field, that would be a valid reason to add it in Discourse.

3 Likes

As @Arkshine explained above this is not a bug but rather a suggestion from Google to add optional field in the schema. I’ll look into it.

2 Likes

From the other thread:

So, yes, “url” is optional, but there are actual genuine errors here, too.

The itemprop="url" helps google to combine multiple Comment blocks on different URLs belonging to the same topic.

I tried to reproduce the errors you are seeing by testing meta topics in Google Rich Results Test, but I don’t see any errors.

Can you provide a link to the topic for which Google is showing errors?

The first thing you should note is that the link you showed indicates that the link does not have Discussion Forum Schema. That link has “Breadcrumbs” schema only, no “Discussion Forum” schema at all. That’s happening because you’re testing the link in “smartphone” mode, and not in “desktop” mode.

https://search.google.com/test/rich-results/result?id=TlLcA6saLMo3BrxbQYnFuw

When you switch the link to desktop testing, “Discussion Forum” schema appears, and it flags the “missing field url” issue.

To reproduce the critical errors, you have to test a long thread with the ?page=2 URL parameter, like this one:

1 Like

I should point out, I think it’s an important bug in Discourse that the schema does not show up in smartphone mode. Google wouldn’t know to flag it (because Google only flags errors in schema that’s present), but smartphone crawling and indexing is the default for Google for years now, so it’s important that any schema appear in smartphone mode and in desktop mode.

2 Likes

The issue described in first post plus some other issues has been fixed in this commit:

Thanks for the suggestions here @rrit! :+1:

This is happening because the first post is not included from second page onwards in crawler view. @sam should we include first post on all the pages in crawler view to fix the schema issues? :thinking:

1 Like

No I don’t think so, duplicating content never ends well, are there other options

3 Likes

The other option would be to replace microdata schema with JSON-LD (which Google recommends). This would decouple rendered data from structured data and will also work on mobile (as Dan pointed out above).

We’re already using JSON-LD schema in solved plugin.

3 Likes

Sure, this sounds like a much more correct solution.

1 Like

Do not include the data/text of the first post on subsequent pages, but always add itemprop="url" pointing to the first page:

See Google structured data for forums and profile pages - #9 by rrit

3 Likes

No rule without an exemption: For DiscussionForumPosting Google recommends the use of Microdata and not JSON-LD.

See Discussion Forum (DiscussionForumPosting) Schema Markup | Google Search Central  |  Documentation  |  Google for Developers

Technical guidelines

  • Unlike our general structured data preference, we recommend providing the DiscussionForumPosting markup in Microdata (or RDFa) if possible. This prevents you from needing to duplicate large text blocks inside markup. However, this is just a recommendation, and JSON-LD is still fully supported.
4 Likes

Is this alread live on meta.discourse.org?

Please see my comment on github:

This whole link-tag should only be defined for post.is_first_post - no need to repeat it with the identical url for each Comment-item.

On meta.discourse.org the quotation marks are mangled right now:
<link itemprop=&#39;mainEntityOfPage&#39; href="…">
See Schema Markup Validator

3 Likes

Yes, we’re doing that now per the recent commit but even after adding that we’re missing some required fields (author, datePublished, text) for subsequent pages (?page=2).

Great catch! Fixed in this PR:

Oh yeah. This was also confirmed by @rrlevering here:

So I guess we’ll have to improve microdata schema while making sure we do not end up duplicating content on subsequent pages.

7 Likes

Thanks for the fix on mainEntityOfPage-property.

And nice find on meta-tag vs. link-tag :+1:
<link itemprop='url' content='<%= @topic_view.absolute_url %>'>

Even better:
<link itemprop='url' href='<%= @topic_view.absolute_url %>'>

See:
– This link is old, but YouTube still uses <link itemprop='url' href='…'> today. –

“[To] provide an URL in HTML5, […] [for link-tag] use the href attribute”
“If you use an URL as value of the content attribute of a meta element, it will represent a string (looking like a URL), not an URL.”


I just rechecked the docs provided by Google on DiscussionForumPosting: properties:

Required properties:

  • author
  • author.name
  • datePublished
  • Either text or image or video

Special note on: Either text or image or video

“This is not required if you are representing a post on another page (with an external url) as in later pages of forums or forum category pages.”

Recommended properties

  • url
  • […]

Special note on: url

“The canonical URL of the discussion. In multi-page threads, set this property to th[e] first page URL. For a single discussion, this is usually the current URL.”

So I conclude:

  • We do not need to add text again on page=2+ (DONE)
  • We must add the optional property url - especially to page=2+ (DONE)

Need for further investigation:

  • Are those “required properties” author, author.name and datePublished really required on page=2+ or can we go without repeating them?
    validator.schema.org does not complain about missing properties on page=2+. (DONE)
    → Wait and check “Google Search Console → Report:Enhancements → Discussion forum” for new live data after these already implemented fixes are live for some time. (TODO)

Structured data: tools and resources

Schema

schema.org

developers.google.com

Validators

schema.org

  • General validator:
    https://validator.schema.org/
    This checks for compliance of structured-data with Schema-definitions and for the markup to be HTML/XML-compliant.
    → The checked requirements follow the Standard™ are pretty broad and not specific.
    → I recommend to fix every detected bug.

Google Search Console

  • Report:Enhancements → Discussion forum:
    https://search.google.com/search-console/r/discussion-forum?hl=en
    This gives direct feedback on processed information by the Google crawler.
    These reports are somehow binding hard-facts about Google SEO: If Google announces something is wrong, Google also thinks it is wrong - even if it is not.
    → If something is flagged as “invalid” or “to improve”, I recommend to first think about a fix. And if there are no known side-effects, then always implement a fix.

Google: Rich Results Test

  • https://search.google.com/test/rich-results?hl=en
    This gives only simulated feedback and is not the Google crawler.
    My opinion: Google marketing tool to tell site-owners “Do something about your structured data!”.
    → This tool is somehow neglected by Google and is not always up-to-date with the latest technical recommendations provided by Google itself.
    Rich Results Test does not always provide the same result as Google Search Console – in case of doubt: Better trust Google Search Console.
3 Likes

Let me write some pseudocode for the current check that is displayed in Search Console. I think that will help a lot on these threads. I could send you the ShEx or SHACL but those are much less human readable.

    if not (IsDeletedContent() OR IsExternalContent())
       then if not ("text" OR "articleBody" OR "sharedContent" OR "image" or "video")
         then report(OneOfThreeRequired("text", "image", "video"))
    if not ("author")
       then Report(Required("author"))
    if not("datePublished")
       then Report(Required("datePublished")

The idea is that if the DiscussionForumPosting/OP has it’s content on the current page, there should be a content field of some sort.

If the DiscussionForumPosting is referencing content on a different page (like on the original page of multi-page content) it can just have a stub that holds whatever (like the OP topic title) and then references the first page URL. That’s the IsExternalContent() check which is just checking whether url != page URL.

The second example in our docs was supposed to exactly model this case (the 14th page refers to a stub post from the first page).

author and date are currently required regardless in our validation rules. That’s mostly to prevent an extra hop to find this data. You could at least see how knowing the date of the OP could be useful to understanding how stale the comment is. Can you just throw meta elements in there with that data? I wasn’t worried about those fields as much with regards to bloating the page with redundant data.

6 Likes

Thanks for the context and tips, Ryan!

This is done. The metadata for subsequent pages (page 2 and onwards) looks good now!

3 Likes

Does adding the author URL while it’s path is blocked by our default robots.txt still makes sense? Should we remove the block from robots.txt now that we promote those URLs?

2 Likes