A basic Discourse archival tool

adrelanos · May 26, 2020, 1:53pm

httrack does not work for me. Using:

httrack https://my-forums.org --user-agent "Googlebot"

httrack is quite promising, but long forum thread with multiple pages are incomplete. Once I click on “page 2” it does not work. I.e.

file:///home/user/My%20Web%20Sites/my-forums/my-forum.org/t/forum-thread-title/83394658.html looks really good (does not fetch from external resources), but
file:///home/user/My%20Web%20Sites/my-forums/my-forum.org/t/forum-thread-title/83394658.html?page=2 is broken.

Any suggestions?

Perhaps httrack can be told somehow to “use print mode”?

example standard forum discussion view
example print forum discussion view same URL just /print was appended at the end

Perhaps httrack can be told to “append /print at the end”?

Is there a user agent setting which shows the whole forum thread on a single page? If not, could you please add this feature? You already implemented print mode. Most is already implemented. What’s left is a user agent to which results in providing contents generated for “print mode” to the crawler? Alternatively, if you don’t like the idea of a custom user agent for this purpose, what about a http header or cookie that could be used for this purpose?

ArchiveDiscourse improved/forked by by @kitsandkats is also broken for me.

Could you please consider also implementing /print also for front page / category pages?

Quote myself in https://meta.discourse.org/t/i-dont-like-infinite-scrolling-and-want-to-disable-it/104660/3

(Temporarily) disabling infinite scroll (for some user agents) would make it possible to archive discourse with the htttrack web archive tool.

Topic		Replies	Views
Make Discourse play nice with the Wayback Machine Feature	49	11571	June 2, 2020
Improving Discourse static HTML archive Feature	5	1997	April 7, 2019
Archive an old forum "in place" to start a new Discourse forum Migrating to Discourse	0	19287	March 5, 2014
5 years on - how have you found Discourse? Community	24	4041	May 9, 2022
Is anyone working on a Discourse Wiki? Feature	41	16294	May 15, 2020

A basic Discourse archival tool

Related topics