Hello. We are currently receiving this message in our Google Search Console. I’m not entirely sure what it signifies. Could I get more clarity on this issue? Is there a solution? Additionally, I would like to mention that we have tried using multiple themes for the platform, but the same error persists.
The first thing you should note is that the link you showed indicates that the link does not have Discussion Forum Schema. That link has “Breadcrumbs” schema only, no “Discussion Forum” schema at all. That’s happening because you’re testing the link in “smartphone” mode, and not in “desktop” mode.
I should point out, I think it’s an important bug in Discourse that the schema does not show up in smartphone mode. Google wouldn’t know to flag it (because Google only flags errors in schema that’s present), but smartphone crawling and indexing is the default for Google for years now, so it’s important that any schema appear in smartphone mode and in desktop mode.
This is happening because the first post is not included from second page onwards in crawler view. @sam should we include first post on all the pages in crawler view to fix the schema issues?
The other option would be to replace microdata schema with JSON-LD (which Google recommends). This would decouple rendered data from structured data and will also work on mobile (as Dan pointed out above).
We’re already using JSON-LD schema in solved plugin.
Unlike our general structured data preference, we recommend providing the DiscussionForumPosting markup in Microdata (or RDFa) if possible. This prevents you from needing to duplicate large text blocks inside markup. However, this is just a recommendation, and JSON-LD is still fully supported.
Yes, we’re doing that now per the recent commit but even after adding that we’re missing some required fields (author, datePublished, text) for subsequent pages (?page=2).
Great catch! Fixed in this PR:
Oh yeah. This was also confirmed by @rrlevering here:
So I guess we’ll have to improve microdata schema while making sure we do not end up duplicating content on subsequent pages.
And nice find on meta-tag vs. link-tag <link itemprop='url' content='<%= @topic_view.absolute_url %>'>
Even better: <link itemprop='url' href='<%= @topic_view.absolute_url %>'>
See:
– This link is old, but YouTube still uses <link itemprop='url' href='…'> today. –
“[To] provide an URL in HTML5, […] [for link-tag] use the href attribute”
“If you use an URL as value of the content attribute of a meta element, it will represent a string (looking like a URL), not an URL.”
→ “This is not required if you are representing a post on another page (with an external url) as in later pages of forums or forum category pages.” ←
Recommended properties
url
[…]
Special note on: url
“The canonical URL of the discussion. In multi-page threads, set this property to th[e] first page URL. For a single discussion, this is usually the current URL.”
So I conclude:
We do not need to add text again on page=2+ (DONE)
We must add the optional property url - especially to page=2+ (DONE)
Need for further investigation:
Are those “required properties” author, author.name and datePublished really required on page=2+ or can we go without repeating them?
→ validator.schema.org does not complain about missing properties on page=2+. (DONE)
→ Wait and check “Google Search Console → Report:Enhancements → Discussion forum” for new live data after these already implemented fixes are live for some time. (TODO)
General validator: https://validator.schema.org/
This checks for compliance of structured-data with Schema-definitions and for the markup to be HTML/XML-compliant.
→ The checked requirements follow the Standard™ are pretty broad and not specific.
→ I recommend to fix every detected bug.
Google Search Console
Report:Enhancements → Discussion forum: https://search.google.com/search-console/r/discussion-forum?hl=en
This gives direct feedback on processed information by the Google crawler.
→ These reports are somehow binding hard-facts about Google SEO: If Google announces something is wrong, Google also thinks it is wrong - even if it is not.
→ If something is flagged as “invalid” or “to improve”, I recommend to first think about a fix. And if there are no known side-effects, then always implement a fix.
Google: Rich Results Test
https://search.google.com/test/rich-results?hl=en
This gives only simulated feedback and is not the Google crawler.
My opinion: Google marketing tool to tell site-owners “Do something about your structured data!”.
→ This tool is somehow neglected by Google and is not always up-to-date with the latest technical recommendations provided by Google itself.
→ Rich Results Test does not always provide the same result as Google Search Console – in case of doubt: Better trust Google Search Console.
Let me write some pseudocode for the current check that is displayed in Search Console. I think that will help a lot on these threads. I could send you the ShEx or SHACL but those are much less human readable.
if not (IsDeletedContent() OR IsExternalContent())
then if not ("text" OR "articleBody" OR "sharedContent" OR "image" or "video")
then report(OneOfThreeRequired("text", "image", "video"))
if not ("author")
then Report(Required("author"))
if not("datePublished")
then Report(Required("datePublished")
The idea is that if the DiscussionForumPosting/OP has it’s content on the current page, there should be a content field of some sort.
If the DiscussionForumPosting is referencing content on a different page (like on the original page of multi-page content) it can just have a stub that holds whatever (like the OP topic title) and then references the first page URL. That’s the IsExternalContent() check which is just checking whether url != page URL.
The second example in our docs was supposed to exactly model this case (the 14th page refers to a stub post from the first page).
author and date are currently required regardless in our validation rules. That’s mostly to prevent an extra hop to find this data. You could at least see how knowing the date of the OP could be useful to understanding how stale the comment is. Can you just throw meta elements in there with that data? I wasn’t worried about those fields as much with regards to bloating the page with redundant data.
Does adding the author URL while it’s path is blocked by our default robots.txt still makes sense? Should we remove the block from robots.txt now that we promote those URLs?