Please leave the URL there even if it’s blocked. You can discuss whether that makes sense or not for your forum use cases but even with the crawl blocked it can help with disambiguation.
That is still just a polite request, and not even Google respect it everytime. For example links in gmail lets googlebot there right away and enough visits leads to indexing and search results.
Plus… we/you don’t know how situations will change in the future. If it is fixed now then there is no need to worry it afterwards. Sure it demands worktime but so does investing and discussing about it
Now the attribute datePublished
for DiscussionForumPosting
on first page
diverts from datePublished
on page=2+
!
first page
:
2015-07-05T22:02:58Zpage=2+
:
2015-07-05T22:02:57Z
I don’t think Google trusts diverting data and thereby might decide that those two urls contain different DiscussionForumPosting
which cannot be combined.
Better use the same data-source on first page
and page=2+
.
E.g. always use the datePublished
from the topic and never from the first post?
search.google.com/test/rich-results for first page
datePublished
: 2015-07-05T22:02:58Z
search.google.com/test/rich-results for page=2
datePublished
: 2015-07-05T22:02:57Z
PR:
Always use
datePublished
from topic and never fromfirst_post
. This ensuresdatePublished
to be consistent onfirst page
andpage=2+
.
No need to repeattext
onpage=2+
. Especially do not settext
onpage=2+
if it is only an abstract and thereby not 100% consistent withtext
onfirst page
.
Unexpected results in Google Search Console: keeptext
attribute on follow-up pagespage=2+
.
Hide post “Closed x days ago” from crawler view
If a topic is closed there is a special post added to the topic:
E.g. see Google structured data for forums and profile pages - #15
Of course this post has no an empty text
attribute. See validator.schema.org for …/t/-/286762 → last comment:
Report in Google Search Console
Conclusion
So this special kind of system/announcement posts should be excluded from the crawler view.
PR
Special kind of system/announcement posts are excluded from the crawler view as they do not have any content.
Empty content triggers a non-critical issue ‘Missing field “text” (in “comment”)’ in Google Search Console.
Would it make more sense for the author name metadata to be set to the full name profile field when available? At least on forums with prioritize username in ux
disabled (but I’d argue either way, the URL field disambiguates anyways).
Is there anything that can be done to sort this out or does discourse team have to update the core?
@rrlevering On this “no need for text
-attribute on follow-up pages” / IsExternalContent()
-check:
I have this test-case on a live-domain:
Discourse implements DiscussionForumPosting
on …
first page
- page URL: https://example.org/t/-/12345- attribute
url
:https://example.org/t/-/12345
- attribute
text
: – set – - attribute
author
: – set –
- attribute
page=2
- page URL: https://example.org/t/-/12345?page=2- attribute
url
:https://example.org/t/-/12345
- attribute
text
: – not set at all – - attribute
author
: – set –
- attribute
Result: Google Search Console (Live test)
first page
:
DiscussionForumPosting
validpage=2
:
DiscussionForumPosting
invalid1 critical issue
–Either "text", "image", or "video" should be specified
So either there is no check on IsExternalContent()
here or the check assumes page URL
equals attribute url
for
- page URL:
https://example.org/t/-/12345?page=2
- attribute
url
:
https://example.org/t/-/12345
So for now we have to repeat the attribute text
on follow-up pages to get a valid DiscussionForumPosting
on Google Search Console.
Invalid schema markup for DiscussionForumPosting
- specific topic/post URLs only
Affected topics: topics with more than a total of 20 posts
Affected URLs: …/t/-/NNN/7
to …/t/-/NNN/20
Report in ‘Google Rich Result Test’
URL …/t/-/NNN/11: different topics with different total posts (click to open)
- Topic with total of 18 posts: result for …/t/-/283678/11 valid
- Topic with total of 19 posts: result for …/t/-/235984/11 valid
- Topic with total of 20 posts: result for …/t/-/264899/11 invalid
- Topic with total of 21 posts: result for …/t/-/282382/11 invalid
– All example topics are ‘closed’ to ensure the total of posts does not change. The bug itself also affects ‘open’ topics! –
URLs …/t/-/16968/1 to …/t/-/16968/38: One topic with currently 38 posts (click to open)
Valid schema markup:
– DiscussionForumPosting
itself still has an unnecessary attribute position: 1
. –
- result for …/t/-/16968:
Comment
-positions 2 to 20 - result for …/t/-/16968/1:
Comment
-positions 2 to 20 - …
- result for …/t/-/16968/6
Comment
-positions 2 to 20.
Invalid schema markup: author
/datePublished
missing
- result for …/t/-/16968/7
Comment
-positions 2 to 21. - result for …/t/-/16968/8
Comment
-positions 3 to 22. - …
- result for …/t/-/16968/20
Comment
-positions 15 to 34.
Valid schema markup again: (here: @page > 1
is true
):
-
result for …/t/-/16968/21:
Comment
-positions 16 to 35 -
result for …/t/-/16968/22:
Comment
-positions 17 to 36 -
…
-
result for …/t/-/16968/24:
Comment
-positions 19 to 38 -
result for …/t/-/16968/25: currently includes
Comment
-positions 19 to 38 -
…
-
result for …/t/-/16968/38 – current last post: currently includes
Comment
-positions 19 to 38 -
…
-
result for …/t/-/16968/999 – inexistent high post: currently includes
Comment
-positions 19 to 38
Technical considerations
1. `@topic_view.prev_page` might not be the best solution to decide whether to display `author`/`datePublished` or not.
app/views/topics/show.html.erb#L53-L60
<% if @topic_view.prev_page %>
<meta itemprop='datePublished' content='<%= @topic_view.topic.created_at.to_formatted_s(:iso8601) %>'>
<span itemprop='author' itemscope itemtype="http://schema.org/Person">
<meta itemprop='name' content='<%= @topic_view.topic.user.username %>'>
<link itemprop='url' href='<%= Discourse.base_url %>/u/<%= @topic_view.topic.user.username %>'>
</span>
<meta itemprop='text' content='<%= @topic_view.topic.excerpt %>'>
<% end %>
2. The implementation of `@topic_view.prev_page` might be buggy by itself.
lib/topic_view.rb#L113-L115
lib/topic_view.rb#L128-L130
lib/topic_view.rb#L193-L195
@post_number = [@post_number.to_i, 1].max
# ---
@page = @page.to_i > 1 ? @page.to_i : calculate_page
# ---
def prev_page
@page > 1 && posts.size > 0 ? @page - 1 : nil
end
Is there a bug here?
lib/topic_view.rb#L751-L755
def calculate_page
posts_count =
is_mega_topic? ? @post_number : unfiltered_posts.where("post_number <= ?", @post_number).count
((posts_count - 1) / @limit) + 1
end
- May
calculate_page
give unexpected results as it uses the current@post_number
and somehow fails for values 7 to 20? ((posts_count - 1) / @limit) + 1
result in something like:
((7 - 1) / 20) + 1 = 1.3 = 1
- What is the expected page number? Maybe calculate with non-integer values, then round the number as intended via
floor
/ceil
and typecast to integer:
(((posts_count - 1.0) / (@limit + 0.0)) + 1.0).floor.to_i
- Maybe check
unfiltered_posts.where("post_number <= ?", @post_number)
as@topic.posts
might not contain all the posts starting with post_1 as intended.
lib/topic_view.rb#L53-L55
lib/topic_view.rb#L119-L127
lib/topic_view.rb#L835-L841
def self.chunk_size
20
end
# ---
@chunk_size =
case
when @print
TopicView.print_chunk_size
else
TopicView.chunk_size
end
@limit ||= @chunk_size
# ---
def unfiltered_posts
result = filter_post_types(@topic.posts)
result = result.with_deleted if @guardian.can_see_deleted_posts?(@topic.category)
result = result.where("user_id IS NOT NULL") if @exclude_deleted_users
result = result.where(hidden: false) if @exclude_hidden
result
end
Conclusion
In this edge case …
- topics with more than a total of 20 posts
…/t/-/NNN/7
to…/t/-/NNN/20
… the first post was not part of the current view and @topic_view.prev_page
did not trigger as the view was still on the first page.
So all attributes of the microdata schema DiscussionForumPosting
which were only rendered in either the context of the first post or on @topic_view.prev_page == true
were missing.
PR
Some attributes of the microdata schema
DiscussionForumPosting
are rendered in the context of the first post. Ensure these attributes are also set if the first post is not part of the current view.
Hmmm… That’s unexpected. I’m sorry for the trouble, I think that URL comparison check is dropping the query parameters in the comparison. Let me get a fix rolled out.
Any update here on this fix?
I believe the fix rolled out this week to consider query parameters in the “is this an external URL” check. So forums that refer to OPs from a different URL by query parameter (foo vs. foo?page=2) will not have errors reported on them in GSC.
Believe the fix rolled out this week to consider query parameters in the “is this an external URL” check