Endless loading behind Cloudflare

We moved our server from one VPS provider to another and I upgraded the instance via launcher rebuild to the latest version as well, from 3.5.0.beta3 to 3.5.0.beta4.

The instance was always running fine behind Cloudflare, but now trying to access it leads to endless 5 dots loading animation.

I have a hosts file entry on my local system to bypass Cloudflare, since my ISP (Deutsche Telekom AG) has shitty peering policies so that access is very low through Cloudflare by times. So at first I did not recognize the issue, as access without Cloudflare works fine. So I upgraded the instance, and hence am now not sure whether the changed VPS or the Discourse upgrade was the relevant change. I assured via VPN and mobile network, that the issue really is Cloudflare itself now, not the bad peering of my ISP, and also other users face the same issue. Old and new VPS have IPv6 available, and the whole system is exactly the same, transferred as raw image file.

There are zero error message, neither in browser (console), nor by the host system’s proxy, nor by Nginx within the container, nor by Rails or anywhere else. The HTML documents and several scripts load fine, and comparing them with those served when bypassing Cloudflare show that everything (I checked) is identical. Also response headers look mostly the same, aside of a few Cloudflare-specific ones, of course. The last things I see being loaded is the mini profiler:

Of course clearing browser cache, using private windows etc all did not change something. Also clearing/disabling the Cloudflare cache does not help, so the cache is not the issue. I temporarily disabled the CF cache completely for the whole forum.

Notable to say that the forum runs on a sub path behind an Apache proxy on the host, following these instructions: Serve Discourse from a subfolder (path prefix) instead of a subdomain
Previously, we created just a ln -s . forum symlink instead of the uploads/backups symlinks and doubled rewrites of the instructions, which worked well for years (and also now without Cloudflare), but as part of my debugging efforts I switched to those instructions to assure the internal proxy applies all rules as intended. Trusted header is CF-Connecting-IP, though I enabled cloudflare.template.yml as well, even that it doubles things somewhat. And I also tried to change back and forth various parts of these templates and above instructions, also in attempt to check whether the proxy IP headers do any difference, as missing CF-Connecting-IP is one thing when bypassing Cloudflare.

At this point I am completely out of ideas, have not a single trace where the issue might be coming from, not a single related log/output anywhere. Through Cloudflare, Discourse just hangs in loading animation without further trace.

I hope someone has an idea how to debug this, or whether there was a change between 3.5.0.beta3 and 3.5.0.beta4 which could be related. I guess a downgrade is problematic?

This is the instance: https://dietpi.com/forum/
EDIT: I disabled Cloudflare for now. But there is a CNAME which is still passed through Cloudflare, so those two can be compared: https://www.dietpi.com/forum/

Interesting problem.

It’s simply https://www.dietpi.com/forum/ that is hanging forever.

$ wget https://www.dietpi.com/forum/
--2025-05-03 10:52:18--  https://www.dietpi.com/forum/
Resolving www.dietpi.com (www.dietpi.com)... 104.21.12.65, 172.67.193.183, 2606:4700:3035::6815:c41, ...
Connecting to www.dietpi.com (www.dietpi.com)|104.21.12.65|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: âindex.html.1â

    [<=>                            

The interesting thing is that calls like https://www.dietpi.com/forum/site.json do succeed.

https://www.dietpi.com/forum/t/why-there-are-two-kernals-in-my-raspberry-pi4/23355 does not work and hangs forever, but
https://www.dietpi.com/forum/t/why-there-are-two-kernals-in-my-raspberry-pi4/23355.json does.

1 Like

Interesting indeed. I recognize just now that the HTML documents are not loading completely but hang at some point. I compared /forum/ in both cases and thought they were identical, but probably I was focusing too much on the head, while parts of the body at the bottom are missing.

That last line when loading via Cloudflare:

      <discourse-assets-json>
        <div class="hidden" id="data-preloaded" data-preloaded="{&quot;topic_list&quot;:&quot;{\&quot;users\&quot;...\&quot;:false,\&quot;allowed_iframes\&quot;:\&quot;https://dietpi.com/forum/discobot/certificate.svg\

Needed to truncate it as it vastly exceeds the character limit for a post. The document usually continues like this:

      <discourse-assets-json>
        <div class="hidden" id="data-preloaded" data-preloaded="{&quot;topic_list&quot;:&quot;{\&quot;users\&quot;...\&quot;:false,\&quot;allowed_iframes\&quot;:\&quot;https://dietpi.com/forum/discobot/certificate.svg\&quot;,\&quot;can_permanently_delete...

The other pages hang at the very same point. I think we are one something here.

EDIT: Ah wait, I checked wrong, other pages hang elsewhere. So it is not this particular HTML element/attribute.

Yeah each page/HTML document hangs at the very same character when loaded via browser again and again, in private window etc. But a different page hangs at a different point. And when loading those via curl, they as well hang always at the very same point, but a different one, and wget again always hangs at a same point, but a slightly different one. Very weird.

Do you have some optimization enabled?

Nope, no (content) optimizations. I did have the 103 Early Hints feature enabled, but disabled it already as attempt to solve things. Tried the same with the protocol settings, but that did not change anything eiher:

Btw, there is no content-length response header, could that cause an issue? I mean it does not exist when bypassing Cloudflare either, but probably Cloudflare then has some issue? EDIT: No seems to be normal for dynamic pages, same with our Wordpress and Matomo pages which however do not cause any issues.

And another find when playing with curl. Printing to STDOUT results in full HTML document being displayed, but it still hangs and at the end:

  <p class='powered-by-link'>Powered by <a href="https://www.discourse.org">Discourse</a>, best viewed with JavaScript enabled</p>
</footer>



  </body>

</html>

But when trying to save it via -o or simple redirect, or even just piping into grep, it hangs at a different place:

            <div class="link-bottom-line">
                <a href='/forum/c/general-discussion/7' class='badge-wrapper bullet'>
                  <span class='badge-category-bg' style='background-color: #F7941D'></span>
                  <span class='badge-category clear-badge'>
                    <span class='category-name'>General Discussion</span>
                  </span>
                </a>

And I can 100% replicate this very same 73728 bytes when accessing https://www.dietpi.com/forum/ with curl without just printing it to console right away. This is so weird :face_with_monocle:.


So:

  • All clients hang at loading any HTML document from our Discourse instance.
  • Each client hangs at the very same byte when loading the same page.
  • Different clients hang at different points, but at the very same byte when repeating with the same client.
  • Each page hangs at a different point in the document and at different downloaded size.
  • The same tool curl hangs at different points when just printing to STDOUT vs piping or storing the document somewhere.
  • wget is able to download the full document (at least https://www.dietpi.com/forum/) to a file, but still hangs at the end, same when curl https://www.dietpi.com/forum/ prints the full document to console, but hangs at the end.

I think that might be buffering. But when investigating, I noticed something else.

wget -O - https://www.dietpi.com/forum/latest

Ends with

  </body>
</html>

But the connection is never closed.

Theory: there is a configuration issue somewhere where there is a mismatch in HTTP versions or headers (like keep alive connection) and this only becomes a problem when a document is larger than X (I suspect 64KB).

Yes, wget always downloads the whole document, and curl does so when printing directly to console, but connection is not closed. Same happens with much smaller documents, like I tested a 14k one of a topic with only 2 posts. But even the smaller ones usually are not fully downloaded by curl when piping or storing to file, neither in browser.

Both tools always show HTTP/2, and in Cloudflare I have HTTP/2 origin requests enabled. But worth to test using other HTTP versions explicitly. Yesterday I disabled all protocol settings in Cloudflare seen on the screenshot above, and it did not help. But I’ll try again. Also I can enable access logs on the server to see the actually incoming request from Cloudflare.

I tried all combinations of supported HTTP (1.1-3) and TLS (1.2-1.3) versions, but that does not make a difference. I also disabled HTTP3 support, HTTP2 origin requests this 0-RTT connection resumption again. No difference, curl keeps hanging at exactly the same 73,728 bytes of https://www.dietpi.com/forum/.

Regarding the theory of too large document sizes, https://www.dietpi.com/dietpi-software.html has 199,475 bytes and loads perfectly fine. I should mention that the server (same webserver) hosts a static website, MkDocs instance, Wordpress, Matomo, which all work perfectly fine. Also there is a Grafana instance where the front webserver acts as proxy via UNIX socket.

But I agree it seems to be related to buffers or chunk sizes or something like that. It is just weird that the downloaded size until hang varies so much between clients and pages, while it remains exactly the same despite changing protocol versions, and that the connection is not even closed when the document has been fully downloaded. Like if the stop signal is missing, though I am missing insights into HTTP at this point. Hence I thought about content-length header, but that one is obviously not mandatory.

The webserver also acts as proxy for the Discourse container via UNIX socket. I could enable the TCP listener to make the Discourse instance additionally available without the proxy (leaving the Nginx within the container, of course).

Could you try KeepAlive Off in Apache?

I guess that would at least possibly rule out the webserver, so that would be worth a try.

1 Like

No change. Also from Apache docs:

In addition, a Keep-Alive connection with an HTTP/1.0 client can only be used when the length of the content is known in advance.

Hence as of missing content-length probably it makes sense that it is not used anyway for this request.

Since it requires a rebuild, I’ll do a little later when our common website activity is at a minimum. Um, I am just thinking about HTTPS … looks like I need to do some custom adjustments to the internal Nginx config to keep the UNIX socket functional as well as plain HTTP connections, while listening on an additional port for HTTPS with the TLS certificates from the host, but without HTTPS redirect/enforcement. … and additional plain HTTP TCP port would be also interesting, for clients which can ignore HSTS.

Are you by chance using RocketLoader in CloudFlare? I know with some other scripts it causes issues.
Also, did you clear the CF cache?
Are you using inbound rules on CF that may have been tied to your old VPS IP address and not updated to the new one?

1 Like

No RocketLoader: Note that as of above tests with curl and wget, which do not interpret any syntax, hence do not load any JavaScript or styles or anything else, the problem is that the download of the raw HTML document always hangs.

Cloudflare cache is not active for the forum, the raw HTML documents were never cached anyway.

No VPS-specific rules. Generally not rules for the forum, aside of to bypass cache. The issue appears in both cases, so cache isn’t the issue either.

1 Like