I’ve been trying to get Discourse to fully serve the JS app to Googlebot - getting very close.
Courtesy @pfaffman and executing the below code (in the rails console) I was able to get the JS app to show up when using chrome and spoofing the user agent to googlebot or googlebot smartphone
However, when I test with the google mobile friendly tool ( or URL inspection in Google search console) it will give me a blank screenshot with the below HTML
My theory is that since googlebot is notorious for not downloading every resource when accessing a page, that specific JS ( which has the feature detect and causing the revert ) doesn’t get loaded and hence the page looks good
So in conclusion, how would someone go about disabling the feature detect for googlebot (or if its easier, all crawlers/bots)?
Edit Just in case I’m off with the terminology, when “feature detect” is mentioned on meta, is that referring to the browser detection ?( perhaps with files likes browser-detect.js and other dependencies)
Or is “feature detect” a broad phrase for what Discourse does when it tries to understand the technology that’s trying to access the app.
Is there a reason why you want to serve the JS version to Googlebot? Google probably won’t be able to find paginated list views, including the paginated home page and topics that have more than a certain number of posts. In the bot view, the topic lists are crawlable, but Googlebot probably isn’t going to trigger the endless scrolling.
We had a very sloppy site update around sept/oct 2019 , the main site tanked right then and there.
We never recovered. The site has never been better as far as SEO. Sure its not perfect but we’re light years ahead of some of the competition. Sites that use our many years’ old images and text outrank us by pages. We’re on the 3rd and they perhaps are top of 2nd page.
I’ve been through countless SEO blogs, videos, posts and even had some back and forth with John Mueller (on Reddit )
The most I got out of him was, it could be “quality issues”. We had improved the main site dramatically since Jan 1 of this year. Not even a blip in organic traffic.
Discourse: I had it installed back in 2013 and forgot about it. Barely would check its traffic.
If you look at the main site analytics, you’ll see a sharp drop towards the end of the chart. This is when I started working on Discourse.
When trying prerender.io on discourse the rank for the main site was all over the place. Sometimes jumping 10-15 spots overnight, then back. (I have since stopped prerender as they couldn’t render the main menu , login etc. )
From what I read online, this is a sign google doesn’t know where to place you. They say just a little “more” and you’re on the good side of the algorithm.
Nothing we’ve done in the last 3 years has triggered these fluctuations in the SERPs.
(Messing with Google disavow tool, cleaning up code, clean URLs, site structure, internal linking, social, content, etc.)
You might make the argument, why didn’t google penalize you in 2018? ( you had discourse on the subdomain then too)
Well, I think it was a multitude of factors unique to the site, it’s history, link profile. that caused it to tank in late 2019. Seems that google reshuffled the site rank and perhaps gave the discourse URLS more weight than what it gave previously.
And the thing is … I love Discourse. Especially now that I’ve been on meta more, all these cool plugins and features I had no idea existed. Wiki, subscription payments, table of contents, and now the chat !!
So moving away from discourse is not really an option, too much invested at this point.
I did consider this and I’m willing to take my chances. I know it won’t be perfect but from what I read and watch , Google has gotten really good at understanding JS as of late.
Before I ever brought sending the JS version to google I was tinkering with it.
I tested sending the JS version to google around the beginning of April or so. I remember it returning a result most of the time ( even if it was broken looking ) . Using the google mobile tool.
I thought it might be this commit - I made the code edits , rebooted and same behavior.
Perhaps someone remembers a PR or commit in the past couple months that may have altered browser and/or crawler detection?
Edit Sorry for all the updates, the more info the better, amirite?
While trying prerender last month , Google ended up adding 2000 urls to the forum coverage. ( mostly these URLS )
They were all served in .005 seconds, prerender had the URLS cached and ready for the googlebot to access. So it took them all quickly
Point is, perhaps the crawler got “very used” to the no JS and commited resources to get those 2k pages.
So now its accessing the site in this manner until it figures things out (and needs to access with JS more) just a theory
One candidate for a sudden penalty is that URLs from the 2019 site have 6 redirects, but Google says to keep it “less than 5” or they might not follow the redirects. That might have made it appear to Google that the old pages disappeared from the Web.
Nice catch and while poorly executed, taking many months , - Google seems to have figured out what pages mean what.
In other words, eventually, I started to see more and more of the pages that I 301’ed to for the keywords used on the old pages
This makes a lot of sense and will see how I can get that impelelnted- presently the search console doesnt show the crawler getting 301 to often. Seems when the rank gets better, they will follow more 301. Causation without correlation perhaps.
It’s totally not a knock on Discourse - I’m just not easily convinced with " thousands of Discourse users have great organic traffic"
Google is not really going to tell us either.
We must always remember Google is an algorithm, they’re not looking at this from a human’s eyes.
While both versions share similar content, and google knows it’s not malicious cloaking- they still have to adjust rank.
One version looks way better, works better, and gives some sense of internal link structure. The other is a glorified RSS feed.
Google has no idea I have this slick forum that works on all [modern] devices, truly encourages discourse, and is one of the coolest things the internet has ever created.
I always like to use the “Powered by discourse” do-follow link in the crawler version. ( just because its easy )
Again, I know not malicious but you must look at it through Google’s eyes. You FlyNumber ( not https://community.cloudflare.com/ ) are giving us this crawler version with an external link you are not showing regular browsers.
I could totally see the algorithm picking up on whats going on and ignoring the external link for the cloudflare domain (as it’s such an authority)
It’s not like what google applies to cloudflare will apply to me.
Did someone pay you for this external link you show bots ( but don’t want to show regular users? ) is more about how they may look at the site. I’m not saying it’s this - but it’s a possibility you’ll want to eliminate.
In simplest terms, the crawler version doesn’t have a menu or any real structure.
That’s the content the algorithm thinks you want to serve to end-users.
From a very general perspective, I can’t see the algorithm rewarding that.
Maybe it’s time we start considering a real overhaul of the crawler version. At least add the main menu, and suggested topics on the bottom.
Interesting update Google has added “JSON” to the “file types” in crawl stats for my discourse instance. “Javascript” is a separate “file type”.
I’m starting to think my logic was flawed from the beginning. It would explain why no one responded - perhaps nothing is wrong.
Here’s a fresh article on how it’s normal for Google to show a white page in the screenshot
I can see the “crawled” HTML for the home page now , this is the indexed version, not from “Live test”- it shows the full page. Keep in mind, Google figured this out while serving them the full JS app.
What’s interesting is they went down to about the 27th post on the home page as far as indexing. So the endless scroll thing is something Google understands.
Not sure if it helped, but I unchecked the ajax setting in admin settings. It caused google to find URLs like the below ( and serve the crawler version ) - I unchecked it, and now that URL will show the JS version
So after at least a month or so of serving the SPA ( Full JS version ) of discourse I’ve gone back to the crawler version.
You can refer to my post history but I was making the argument that Google may understand the JS version and rank it better than the crawler version. I was wrong.
Hey @j127 you were correct! ( will be PMing you good sir )
Seems Google did figure out the site, but it just ranked it about the same ( if not slightly lower )
The crawler version was also updated back in April/May as far as the link colors, format etc. so thats a nice help
IMHO if we were to add a simple menu and the “suggested topics” to the crawler version it would make a nice difference to everyone’s SEO.
Other than that, I just wanted to put this out there just in case anyone was curious.