Discussion Forum Schema improvements

hiccup · December 3, 2023, 8:07pm

Hello. We are currently receiving this message in our Google Search Console. I’m not entirely sure what it signifies. Could I get more clarity on this issue? Is there a solution? Additionally, I would like to mention that we have tried using multiple themes for the platform, but the same error persists.

Arkshine · December 4, 2023, 12:16am

Hello, hiccup!

Structured Data helps provide search engines with more context, essentially.

Google Search doesn’t find an optional url field in that topic.
You can see on validator.schema.org that it’s perfectly valid without any warnings.

There is nothing to worry about.
That said, if Google Search highlights this field, that would be a valid reason to add it in Discourse.

techAPJ · December 4, 2023, 5:55am

As @Arkshine explained above this is not a bug but rather a suggestion from Google to add optional field in the schema. I’ll look into it.

dfabulich · December 17, 2023, 11:00pm

From the other thread:

So, yes, “url” is optional, but there are actual genuine errors here, too.

rrit · December 17, 2023, 11:10pm

The itemprop="url" helps google to combine multiple Comment blocks on different URLs belonging to the same topic.

techAPJ · December 26, 2023, 1:36pm

I tried to reproduce the errors you are seeing by testing meta topics in Google Rich Results Test, but I don’t see any errors.

Can you provide a link to the topic for which Google is showing errors?

dfabulich · December 26, 2023, 6:59pm

The first thing you should note is that the link you showed indicates that the link does not have Discussion Forum Schema. That link has “Breadcrumbs” schema only, no “Discussion Forum” schema at all. That’s happening because you’re testing the link in “smartphone” mode, and not in “desktop” mode.

https://search.google.com/test/rich-results/result?id=TlLcA6saLMo3BrxbQYnFuw

When you switch the link to desktop testing, “Discussion Forum” schema appears, and it flags the “missing field url” issue.

To reproduce the critical errors, you have to test a long thread with the ?page=2 URL parameter, like this one:

dfabulich · December 26, 2023, 7:01pm

I should point out, I think it’s an important bug in Discourse that the schema does not show up in smartphone mode. Google wouldn’t know to flag it (because Google only flags errors in schema that’s present), but smartphone crawling and indexing is the default for Google for years now, so it’s important that any schema appear in smartphone mode and in desktop mode.

techAPJ · December 27, 2023, 10:55am

The issue described in first post plus some other issues has been fixed in this commit:

Thanks for the suggestions here @rrit!

This is happening because the first post is not included from second page onwards in crawler view. @sam should we include first post on all the pages in crawler view to fix the schema issues?

sam · December 27, 2023, 9:35pm

No I don’t think so, duplicating content never ends well, are there other options

techAPJ · December 28, 2023, 7:39am

The other option would be to replace microdata schema with JSON-LD (which Google recommends). This would decouple rendered data from structured data and will also work on mobile (as Dan pointed out above).

We’re already using JSON-LD schema in solved plugin.

sam · December 28, 2023, 11:43pm

Sure, this sounds like a much more correct solution.

rrit · December 30, 2023, 3:44pm

Do not include the data/text of the first post on subsequent pages, but always add itemprop="url" pointing to the first page:

See Google structured data for forums and profile pages - #9 by rrit

rrit · December 30, 2023, 3:48pm

No rule without an exemption: For DiscussionForumPosting Google recommends the use of Microdata and not JSON-LD.

See Discussion Forum (DiscussionForumPosting) Schema Markup | Google Search Central | Documentation | Google for Developers

Technical guidelines

Unlike our general structured data preference, we recommend providing the DiscussionForumPosting markup in Microdata (or RDFa) if possible. This prevents you from needing to duplicate large text blocks inside markup. However, this is just a recommendation, and JSON-LD is still fully supported.

rrit · December 30, 2023, 3:51pm

Is this alread live on meta.discourse.org?

Please see my comment on github:

This whole link-tag should only be defined for post.is_first_post - no need to repeat it with the identical url for each Comment-item.

On meta.discourse.org the quotation marks are mangled right now:
<link itemprop='mainEntityOfPage' href="…">
See Schema Markup Validator

techAPJ · January 1, 2024, 8:25am

Yes, we’re doing that now per the recent commit but even after adding that we’re missing some required fields (author, datePublished, text) for subsequent pages (?page=2).

Great catch! Fixed in this PR:

Oh yeah. This was also confirmed by @rrlevering here:

So I guess we’ll have to improve microdata schema while making sure we do not end up duplicating content on subsequent pages.

rrit · January 1, 2024, 3:00pm

Thanks for the fix on mainEntityOfPage-property.

And nice find on meta-tag vs. link-tag
~~<link itemprop='url' content='<%= @topic_view.absolute_url %>'>~~

Even better:
<link itemprop='url' href='<%= @topic_view.absolute_url %>'>

See:
– This link is old, but YouTube still uses <link itemprop='url' href='…'> today. –

“[To] provide an URL in HTML5, […] [for link-tag] use the href attribute”
“If you use an URL as value of the content attribute of a meta element, it will represent a string (looking like a URL), not an URL.”

I just rechecked the docs provided by Google on DiscussionForumPosting: properties:

Required properties:

author

author.name

datePublished

Either text or image or video

Special note on: Either text or image or video

→ “This is not required if you are representing a post on another page (with an external url) as in later pages of forums or forum category pages.” ←

Recommended properties

url

[…]

Special note on: url

“The canonical URL of the discussion. In multi-page threads, set this property to th[e] first page URL. For a single discussion, this is usually the current URL.”

So I conclude:

We do not need to add text again on page=2+ (DONE)
We must add the optional property url - especially to page=2+ (DONE)

Need for further investigation:

Are those “required properties” author, author.name and datePublished really required on page=2+ or can we go without repeating them?
→ validator.schema.org does not complain about missing properties on page=2+. (DONE)
→ Wait and check “Google Search Console → Report:Enhancements → Discussion forum” for new live data after these already implemented fixes are live for some time. (TODO)

Structured data: tools and resources

Schema

`schema.org`

DiscussionForumPosting

`developers.google.com`

DiscussionForumPosting

Validators

`schema.org`

General validator:
https://validator.schema.org/
This checks for compliance of structured-data with Schema-definitions and for the markup to be HTML/XML-compliant.
→ The checked requirements follow the Standard™ are pretty broad and not specific.
→ I recommend to fix every detected bug.

Google Search Console

Report:Enhancements → Discussion forum:
https://search.google.com/search-console/r/discussion-forum?hl=en
This gives direct feedback on processed information by the Google crawler.
→ These reports are somehow binding hard-facts about Google SEO: If Google announces something is wrong, Google also thinks it is wrong - even if it is not.
→ If something is flagged as “invalid” or “to improve”, I recommend to first think about a fix. And if there are no known side-effects, then always implement a fix.

Google: Rich Results Test

https://search.google.com/test/rich-results?hl=en
This gives only simulated feedback and is not the Google crawler.
My opinion: Google marketing tool to tell site-owners “Do something about your structured data!”.
→ This tool is somehow neglected by Google and is not always up-to-date with the latest technical recommendations provided by Google itself.
→ Rich Results Test does not always provide the same result as Google Search Console – in case of doubt: Better trust Google Search Console.

rrlevering · January 2, 2024, 9:02pm

Let me write some pseudocode for the current check that is displayed in Search Console. I think that will help a lot on these threads. I could send you the ShEx or SHACL but those are much less human readable.

    if not (IsDeletedContent() OR IsExternalContent())
       then if not ("text" OR "articleBody" OR "sharedContent" OR "image" or "video")
         then report(OneOfThreeRequired("text", "image", "video"))
    if not ("author")
       then Report(Required("author"))
    if not("datePublished")
       then Report(Required("datePublished")

The idea is that if the DiscussionForumPosting/OP has it’s content on the current page, there should be a content field of some sort.

If the DiscussionForumPosting is referencing content on a different page (like on the original page of multi-page content) it can just have a stub that holds whatever (like the OP topic title) and then references the first page URL. That’s the IsExternalContent() check which is just checking whether url != page URL.

The second example in our docs was supposed to exactly model this case (the 14th page refers to a stub post from the first page).

author and date are currently required regardless in our validation rules. That’s mostly to prevent an extra hop to find this data. You could at least see how knowing the date of the OP could be useful to understanding how stale the comment is. Can you just throw meta elements in there with that data? I wasn’t worried about those fields as much with regards to bloating the page with redundant data.

techAPJ · January 3, 2024, 2:15pm

Thanks for the context and tips, Ryan!

This is done. The metadata for subsequent pages (page 2 and onwards) looks good now!

Falco · January 3, 2024, 2:41pm

Does adding the author URL while it’s path is blocked by our default robots.txt still makes sense? Should we remove the block from robots.txt now that we promote those URLs?

Topic		Replies	Views
Google Structured Data -- Invalid Article Schema Support	44	8653	November 9, 2018
Different schema type for Topics and Posts Feature	16	3761	January 1, 2024
Google structured data for forums and profile pages Feature seo	14	1577	January 1, 2024
Removing the /2, /3, /4, etc links for each reply within a topic URL Dev seo	33	4026	October 13, 2024
Search console "author" field missing: (+1,000% increase Bug	7	466	April 3, 2024

Discussion Forum Schema improvements

Technical guidelines

Required properties:

Special note on: Either `text` or `image` or `video`

Recommended properties

Special note on: `url`

Structured data: tools and resources

Schema

`schema.org`

`developers.google.com`

Validators

`schema.org`

Google Search Console

Google: Rich Results Test

Discussion Forum Schema improvements

Technical guidelines

Required properties:

Special note on: Either text or image or video

Recommended properties

Special note on: url

Structured data: tools and resources

Schema

schema.org

developers.google.com

Validators

schema.org

Google Search Console

Google: Rich Results Test

Related topics

Special note on: Either `text` or `image` or `video`

Special note on: `url`

`schema.org`

`developers.google.com`

`schema.org`