Discussion Forum Schema improvements

Please leave the URL there even if it’s blocked. You can discuss whether that makes sense or not for your forum use cases but even with the crawl blocked it can help with disambiguation.

6 Likes

That is still just a polite request, and not even Google respect it everytime. For example links in gmail lets googlebot there right away and enough visits leads to indexing and search results.

Plus… we/you don’t know how situations will change in the future. If it is fixed now then there is no need to worry it afterwards. Sure it demands worktime but so does investing and discussing about it :smirk:

1 Like

Now the attribute datePublished for DiscussionForumPosting on first page diverts from datePublished on page=2+!

  • first page:
    2015-07-05T22:02:58Z
  • page=2+:
    2015-07-05T22:02:57Z

I don’t think Google trusts diverting data and thereby might decide that those two urls contain different DiscussionForumPosting which cannot be combined.

Better use the same data-source on first page and page=2+.
E.g. always use the datePublished from the topic and never from the first post?

search.google.com/test/rich-results for first page
datePublished: 2015-07-05T22:02:58Z

search.google.com/test/rich-results for page=2
datePublished: 2015-07-05T22:02:57Z


PR:

Always use datePublished from topic and never from first_post. This ensures datePublished to be consistent on first page and page=2+.

No need to repeat text on page=2+. Especially do not set text on page=2+ if it is only an abstract and thereby not 100% consistent with text on first page.
Unexpected results in Google Search Console: keep text attribute on follow-up pages page=2+.

3 Likes

Hide post “Closed x days ago” from crawler view

If a topic is closed there is a special post added to the topic:
E.g. see Google structured data for forums and profile pages - #15

grafik

Of course this post has no an empty text attribute. See validator.schema.org for …/t/-/286762 → last comment:

Report in Google Search Console

Conclusion

So this special kind of system/announcement posts should be excluded from the crawler view.

PR

Special kind of system/announcement posts are excluded from the crawler view as they do not have any content.

Empty content triggers a non-critical issue ‘Missing field “text” (in “comment”)’ in Google Search Console.

Would it make more sense for the author name metadata to be set to the full name profile field when available? At least on forums with prioritize username in ux disabled (but I’d argue either way, the URL field disambiguates anyways).

Is there anything that can be done to sort this out or does discourse team have to update the core?

@rrlevering On this “no need for text-attribute on follow-up pages” / IsExternalContent()-check:

I have this test-case on a live-domain:

Discourse implements DiscussionForumPosting on …

  • first page - page URL: https://example.org/t/-/12345
    • attribute url: https://example.org/t/-/12345
    • attribute text: – set –
    • attribute author: – set –
  • page=2 - page URL: https://example.org/t/-/12345?page=2
    • attribute url: https://example.org/t/-/12345
    • attribute text: – not set at all –
    • attribute author: – set –

Result: Google Search Console (Live test)

  • first page:
    DiscussionForumPosting valid
  • page=2:
    DiscussionForumPosting invalid
    • 1 critical issueEither "text", "image", or "video" should be specified

So either there is no check on IsExternalContent() here or the check assumes page URL equals attribute url for

  • page URL:
    https://example.org/t/-/12345?page=2
  • attribute url:
    https://example.org/t/-/12345

So for now we have to repeat the attribute text on follow-up pages to get a valid DiscussionForumPosting on Google Search Console.

Invalid schema markup for DiscussionForumPosting - specific topic/post URLs only

Affected topics: topics with more than a total of 20 posts
Affected URLs: …/t/-/NNN/7 to …/t/-/NNN/20

Report in ‘Google Rich Result Test’

URL …/t/-/NNN/11: different topics with different total posts (click to open)

– All example topics are ‘closed’ to ensure the total of posts does not change. The bug itself also affects ‘open’ topics! –

URLs …/t/-/16968/1 to …/t/-/16968/38: One topic with currently 38 posts (click to open)

Valid schema markup:
DiscussionForumPosting itself still has an unnecessary attribute position: 1. –

Invalid schema markup: author/datePublished missing

Valid schema markup again: (here: @page > 1 is true):

Technical considerations

1. `@topic_view.prev_page` might not be the best solution to decide whether to display `author`/`datePublished` or not.

app/views/topics/show.html.erb#L53-L60

      <% if @topic_view.prev_page %>
        <meta itemprop='datePublished' content='<%= @topic_view.topic.created_at.to_formatted_s(:iso8601) %>'>
        <span itemprop='author' itemscope itemtype="http://schema.org/Person">
          <meta itemprop='name' content='<%= @topic_view.topic.user.username %>'>
          <link itemprop='url' href='<%= Discourse.base_url %>/u/<%= @topic_view.topic.user.username %>'>
        </span>
        <meta itemprop='text' content='<%= @topic_view.topic.excerpt %>'>
      <% end %>
2. The implementation of `@topic_view.prev_page` might be buggy by itself.

lib/topic_view.rb#L113-L115
lib/topic_view.rb#L128-L130
lib/topic_view.rb#L193-L195

    @post_number = [@post_number.to_i, 1].max
# ---
    @page = @page.to_i > 1 ? @page.to_i : calculate_page
# ---
  def prev_page
    @page > 1 && posts.size > 0 ? @page - 1 : nil
  end

Is there a bug here?
lib/topic_view.rb#L751-L755

  def calculate_page
    posts_count =
      is_mega_topic? ? @post_number : unfiltered_posts.where("post_number <= ?", @post_number).count
    ((posts_count - 1) / @limit) + 1
  end
  • May calculate_page give unexpected results as it uses the current @post_number and somehow fails for values 7 to 20?
  • ((posts_count - 1) / @limit) + 1 result in something like:
    ((7 - 1) / 20) + 1 = 1.3 = 1
  • What is the expected page number? Maybe calculate with non-integer values, then round the number as intended via floor/ceil and typecast to integer:
    (((posts_count - 1.0) / (@limit + 0.0)) + 1.0).floor.to_i
  • Maybe check unfiltered_posts.where("post_number <= ?", @post_number) as @topic.posts might not contain all the posts starting with post_1 as intended.

lib/topic_view.rb#L53-L55
lib/topic_view.rb#L119-L127
lib/topic_view.rb#L835-L841

  def self.chunk_size
    20
  end
# ---
    @chunk_size =
      case
      when @print
        TopicView.print_chunk_size
      else
        TopicView.chunk_size
      end

    @limit ||= @chunk_size
# ---
  def unfiltered_posts
    result = filter_post_types(@topic.posts)
    result = result.with_deleted if @guardian.can_see_deleted_posts?(@topic.category)
    result = result.where("user_id IS NOT NULL") if @exclude_deleted_users
    result = result.where(hidden: false) if @exclude_hidden
    result
  end

Conclusion

In this edge case …

  • topics with more than a total of 20 posts
  • …/t/-/NNN/7 to …/t/-/NNN/20

… the first post was not part of the current view and @topic_view.prev_page did not trigger as the view was still on the first page.

So all attributes of the microdata schema DiscussionForumPosting which were only rendered in either the context of the first post or on @topic_view.prev_page == true were missing.

PR

Some attributes of the microdata schema DiscussionForumPosting are rendered in the context of the first post. Ensure these attributes are also set if the first post is not part of the current view.

3 Likes

Hmmm… That’s unexpected. I’m sorry for the trouble, I think that URL comparison check is dropping the query parameters in the comparison. Let me get a fix rolled out.

3 Likes

Any update here on this fix?

I believe the fix rolled out this week to consider query parameters in the “is this an external URL” check. So forums that refer to OPs from a different URL by query parameter (foo vs. foo?page=2) will not have errors reported on them in GSC.

3 Likes

Believe the fix rolled out this week to consider query parameters in the “is this an external URL” check