Improving Discourse static HTML archive

It is recommended to use HTTrack to take a dump of static HTML and host that as a static archived website. But the layout for crawlers is not very pretty to host it as a static site. I will be working on improving the layout and adding necessary data to the static website. You can see the crawler layout at https://meta.discourse.org/?escaped_fragment which I will try to improve.

This is just a placeholder to link with changes I make so that someone reviewing it can get more context.

Let me know if you have any suggestions on this topic.

Thanks

6 Likes

I have created few pull requests related to this and added screenshots in them:
https://github.com/discourse/discourse/pull/7250
https://github.com/discourse/discourse/pull/7270
https://github.com/discourse/discourse/pull/7286

Let me know if you have any suggestions.

5 Likes

Sorry in advanced for my question since I’m not very familiar with HTTrack. Why do we need to use HTTrack to take a dump of the static HTML page and host that as a static archived website?

5 Likes

Hey,
You can go through these links to get more context related to this:

HTTrack will basically just crawl your website and create a static HTML dump which you can host as a static website.

Quoting from the link above on why people want it.

Let me know if you have any other questions.

1 Like

You do not “need” to use httrack tool you can use recursive wget and other similar command line Linuxy spidering tools as well.

3 Likes

Just an update regarding this.

All 3 pull requests have been merged. I’m adding screenshots with the new static archive look here below. Let me know if any of you have any suggestions on things to improve.

7 Likes