我的实例需要实现 "window.prerenderReady" [付费职位]

stance455 · 2022 年4 月 25 日 15:30

我正在使用 prerender.io 向爬虫提供网站的“application/JS”版本（我的实例通过隐藏设置向爬虫提供 JS 版本）。

效果很好，但似乎 discourse 可能属于此类别：属于此类 …

但有些网页使用自定义加载流程或持续轮询，这可能会欺骗 Prerender 的逻辑；因此，它无法决定页面的就绪状态。

第一次 prerender 访问任何 discourse URL 时，它将超时（prerender 设置的 20 秒）。

页面可以正常渲染，只是 prerender.io 不知道页面已完全加载，因此它会“一直”尝试渲染页面，直到 20 秒结束，然后提供 HTML 版本。

如果爬虫再次请求页面，它将在 1 秒内（或稍多一点）提供页面——因为该 URL 的 HTML 版本已缓存。

……但这不切实际，因为有数千个 URL，每个 URL 首次访问时需要 20 秒，这是行不通的。

因此，我需要在 \u003chead\u003e 标签之后（并在页面完成后将变量设置为 true）添加以下内容：

\u003cscript\u003e window.prerenderReady = false; \u003c/script\u003e

我希望这能在整个网站上生效——希望这能让工作更轻松。

不确定这需要什么，但如果我错了，请告诉我——300 美元？400 美元？

stance455 · 2022 年4 月 26 日 13:30

您对此有什么反馈吗？

也许我可以编辑一个核心文件。

pfaffman · 2022 年4 月 26 日 13:45

你有没有某个地方的代码可以用来做这件事？

stance455 · 2022 年4 月 26 日 14:06

用于提供 JavaScript 版本的代码？

这是您（@pfaffman）帮助我启用/调整过的隐藏网站设置“crawler_user_agents”。

编辑我已从上述列表中删除了“bots”、“crawlers”和“spiders”。

pfaffman · 2022 年4 月 26 日 14:07

prerender.io 是如何参与的？ Discourse 将如何知道何时包含 <head> 标签？

stance455 · 2022 年4 月 26 日 14:11

哦，我认为 prerender 指的是现有的 <head> 标签，对吗？

我们需要将 <script> window.prerenderReady = false; </script> 添加到现有的 <head> 标签的正下方。

编辑我也不确定他们是否需要将代码设置在 head 标签内，还是在 closing head 标签之后。

pfaffman · 2022 年4 月 26 日 14:17

您是如何安装 prerender 以使其提供预渲染页面的？How to Install Prerender in 3 Easy Steps 上列出了三种方法。您使用了其中一种吗？

stance455 · 2022 年4 月 26 日 14:51

是的，我使用了 Cloudflare 中间件。

因此，Cloudflare 会将来自机器人的任何请求发送给 Prerender。

pfaffman · 2022 年4 月 26 日 14:56

那么，您能提供一个 prerender API 调用，以返回您想要的 true/false 值吗？

 <script> window.prerenderReady = false; </script>

stance455 · 2022 年4 月 26 日 15:34

我明白了，- 我阅读了API文档，这可能有点超出我的能力范围（但希望这能让任务更容易）。

angus · 2022 年4 月 26 日 16:29

预算有点低，做这种事基本上是这样。修改提供给爬虫的内容可能比看起来要棘手。可能会出现各种问题。

就我个人而言，我有点怀疑首先这样做的明智性，但我相信你一定有你的理由。

我认为 Jay 指的是 discourse 客户端 API。你可以通过主题组件使用它来判断 Discourse 何时完全渲染。

你似乎对软件开发有些了解。我去年制作了一个关于 discourse 主题开发的入门课程，其中讨论了如何在主题中使用 API。它是免费且开源的。你可以从这里开始阅读：

github.com/pavilionedu/discourse-theme-introduction

START

main

# learning_course
# key:         discourse_theme_development
# title:       Discourse Theme Development
# description: This course introduces you to Discourse theme development. It's 
#              made up of a series of units, each of which have a topic in a
#              category for this course on [Pavilion Education](https://education.thepavilion.io),
#              and their own repository on the [Pavilion Education Github](https://github.com/pavilionedu). 
#              Each unit has a series of steps which appear both as posts in the 
#              unit's topic and as comments in the code of the unit's repository. 
#              The content of the steps is the same in both places.
#
#              The best way to take this course is by working through each step in
#              each unit, step by step. It's easier to read the steps on 
#              [Pavilion Education](https://education.thepavilion.io), where you'll
#              also get the chance to have your work reviewed, watch videos of 
#              teachers completing the steps, chat with other students and track
#              your progress.
#
#              This course requires some knowledge of HTML, CSS and Javascript 
#              and a minimum degree of comfort with using your computer's terminal

This file has been truncated. show original

你可能需要使用页面渲染时触发的前端事件。课程中的第一个 JavaScript 单元中就有一些示例：

github.com/pavilionedu/discourse-theme-javascript-one

javascripts/discourse/initializers/theme-javascript.js

main


      
          export default {
            name: "theme-javascript-initializer",
            initialize() {
              withPluginApi("0.8.30", api => {
          
              });
            }
          };
          /* /learning_step */
          
          /* learning_step
          * unit:        discourse_theme_development.6
          * number:      6
          * title:       Using the client-side event bus.
          * description: Now that we're loading our external javascript when Discourse is
          *              initialized, and we've made it interact with page changes, the
          *              next step is to make it interact with other events in the Discourse
          *              client. Here we can use the Discourse client-side event bus.
          *
          *              In the plugin-api.js file, find the api method ``onAppEvent``.
          *              This is the wrapper we can use to register callbacks when specific

stance455 · 2022 年4 月 26 日 19:02

感谢您的回复 Angus

我不确定是否需要这样做。爬虫已经获取了我 discourse 实例的 HTML 版本。

现在说还为时过早，但我相当乐观。这只是大量的 SEO 清理工作——Google 正在抓取一个全新的网站。我无法想象 Google 会根据非 JS 爬虫版本给出的排名，与实际用户体验相同的排名。

我首先需要完成的是将该代码放在 head 中。

然后，根据 prerender.io 的要求，实现这部分内容。

然后确保您仅在页面完成渲染时才将此变量设置为 true，此时 Prerender 可以安全地抓取内容。这可以通过一个在页面后期运行的异步调用来实现。然后 Prerender.io 将等待一小段时间，以确保所有调用都已完成并保存您的页面。

我将仔细阅读您提供的文档——感谢您提供这些文档。

justin · 2022 年4 月 26 日 22:14

我不确定你看到了什么，但根据我们的经验，爬虫视图的排名相当不错。我们有客户报告说他们的社区排名超过了他们的主网站。

stance455 · 2022 年4 月 27 日 14:44

这可能仅仅是因为该网站拥有非常有价值的内容，并且算法忽略了“不良部分”。每个案例都不同。

从谷歌的角度来看，让爬虫版本像 JavaScript 版本一样进行排名是没有意义的（笼统地说）。

没有菜单、没有推荐话题、没有侧边栏链接、用户个人资料/徽章页面是 noindex 的，以及大量其他在爬虫版本中不可用的功能。

一旦结果出来，我将更新一个新话题。到目前为止，在 SERP 中的定位非常不稳定。

编辑用户个人资料/徽章页面通过标头设置为 noindex。

话题		回复	浏览量
Disable or bypass feature detect for Googlebot (while serving JS app to crawlers) Support unsupported-install	7	3466	2022 年6 月 14 日
Defer javascript and show interim content on initial page load Development	19	2912	2022 年7 月 5 日
[Paid work] Allow search engines to crawl the production JS-version Marketplace	5	977	2022 年4 月 4 日
Googlebot is getting non-javascript version of the site Development	16	1692	2024 年3 月 9 日
Broken crawler view when you disable Javascript Feature	1	683	2022 年5 月 9 日

我的实例需要实现 "window.prerenderReady" [付费职位]

相关话题