httrack does not work for me. Using:
httrack https://my-forums.org --user-agent "Googlebot"
httrack is quite promising, but long forum thread with multiple pages are incomplete. Once I click on “page 2” it does not work. I.e.
file:///home/user/My%20Web%20Sites/my-forums/my-forum.org/t/forum-thread-title/83394658.html looks really good (does not fetch from external resources), but
file:///home/user/My%20Web%20Sites/my-forums/my-forum.org/t/forum-thread-title/83394658.html?page=2 is broken.
httrack can be told somehow to “use print mode”?
httrack can be told to “append /print at the end”?
Is there a user agent setting which shows the whole forum thread on a single page? If not, could you please add this feature? You already implemented print mode. Most is already implemented. What’s left is a user agent to which results in providing contents generated for “print mode” to the crawler? Alternatively, if you don’t like the idea of a custom user agent for this purpose, what about a http header or cookie that could be used for this purpose?
ArchiveDiscourse improved/forked by by @kitsandkats is also broken for me.
Could you please consider also implementing /print also for front page / category pages?
Quote myself in I don't like infinite scrolling and want to disable it
(Temporarily) disabling infinite scroll (for some user agents) would make it possible to archive discourse with the htttrack web archive tool.