Improving Discourse static HTML archive

It is recommended to use HTTrack to take a dump of static HTML and host that as a static archived website. But the layout for crawlers is not very pretty to host it as a static site. I will be working on improving the layout and adding necessary data to the static website. You can see the crawler layout at https://meta.discourse.org/?escaped_fragment which I will try to improve.

This is just a placeholder to link with changes I make so that someone reviewing it can get more context.

Let me know if you have any suggestions on this topic.

Thanks

6 Likes

I have created few pull requests related to this and added screenshots in them:
https://github.com/discourse/discourse/pull/7250
https://github.com/discourse/discourse/pull/7270
https://github.com/discourse/discourse/pull/7286

Let me know if you have any suggestions.

5 Likes

Sorry in advanced for my question since I’m not very familiar with HTTrack. Why do we need to use HTTrack to take a dump of the static HTML page and host that as a static archived website?

5 Likes

Hey,
You can go through these links to get more context related to this:

HTTrack will basically just crawl your website and create a static HTML dump which you can host as a static website.

Quoting from the link above on why people want it.

Let me know if you have any other questions.

You do not “need” to use httrack tool you can use recursive wget and other similar command line Linuxy spidering tools as well.

3 Likes

Just an update regarding this.

All 3 pull requests have been merged. I’m adding screenshots with the new static archive look here below. Let me know if any of you have any suggestions on things to improve.

7 Likes