Endless loading behind Cloudflare

We moved our server from one VPS provider to another and I upgraded the instance via launcher rebuild to the latest version as well, from 3.5.0.beta3 to 3.5.0.beta4.

The instance was always running fine behind Cloudflare, but now trying to access it leads to endless 5 dots loading animation.

I have a hosts file entry on my local system to bypass Cloudflare, since my ISP (Deutsche Telekom AG) has shitty peering policies so that access is very low through Cloudflare by times. So at first I did not recognize the issue, as access without Cloudflare works fine. So I upgraded the instance, and hence am now not sure whether the changed VPS or the Discourse upgrade was the relevant change. I assured via VPN and mobile network, that the issue really is Cloudflare itself now, not the bad peering of my ISP, and also other users face the same issue. Old and new VPS have IPv6 available, and the whole system is exactly the same, transferred as raw image file.

There are zero error message, neither in browser (console), nor by the host system’s proxy, nor by Nginx within the container, nor by Rails or anywhere else. The HTML documents and several scripts load fine, and comparing them with those served when bypassing Cloudflare show that everything (I checked) is identical. Also response headers look mostly the same, aside of a few Cloudflare-specific ones, of course. The last things I see being loaded is the mini profiler:

Of course clearing browser cache, using private windows etc all did not change something. Also clearing/disabling the Cloudflare cache does not help, so the cache is not the issue. I temporarily disabled the CF cache completely for the whole forum.

Notable to say that the forum runs on a sub path behind an Apache proxy on the host, following these instructions: Serve Discourse from a subfolder (path prefix) instead of a subdomain
Previously, we created just a ln -s . forum symlink instead of the uploads/backups symlinks and doubled rewrites of the instructions, which worked well for years (and also now without Cloudflare), but as part of my debugging efforts I switched to those instructions to assure the internal proxy applies all rules as intended. Trusted header is CF-Connecting-IP, though I enabled cloudflare.template.yml as well, even that it doubles things somewhat. And I also tried to change back and forth various parts of these templates and above instructions, also in attempt to check whether the proxy IP headers do any difference, as missing CF-Connecting-IP is one thing when bypassing Cloudflare.

At this point I am completely out of ideas, have not a single trace where the issue might be coming from, not a single related log/output anywhere. Through Cloudflare, Discourse just hangs in loading animation without further trace.

I hope someone has an idea how to debug this, or whether there was a change between 3.5.0.beta3 and 3.5.0.beta4 which could be related. I guess a downgrade is problematic?

This is the instance: https://dietpi.com/forum/
EDIT: I disabled Cloudflare for now. But there is a CNAME which is still passed through Cloudflare, so those two can be compared: https://www.dietpi.com/forum/

Interesting problem.

It’s simply https://www.dietpi.com/forum/ that is hanging forever.

$ wget https://www.dietpi.com/forum/
--2025-05-03 10:52:18--  https://www.dietpi.com/forum/
Resolving www.dietpi.com (www.dietpi.com)... 104.21.12.65, 172.67.193.183, 2606:4700:3035::6815:c41, ...
Connecting to www.dietpi.com (www.dietpi.com)|104.21.12.65|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: âindex.html.1â

    [<=>                            

The interesting thing is that calls like https://www.dietpi.com/forum/site.json do succeed.

https://www.dietpi.com/forum/t/why-there-are-two-kernals-in-my-raspberry-pi4/23355 does not work and hangs forever, but
https://www.dietpi.com/forum/t/why-there-are-two-kernals-in-my-raspberry-pi4/23355.json does.

1 Like

Interesting indeed. I recognize just now that the HTML documents are not loading completely but hang at some point. I compared /forum/ in both cases and thought they were identical, but probably I was focusing too much on the head, while parts of the body at the bottom are missing.

That last line when loading via Cloudflare:

      <discourse-assets-json>
        <div class="hidden" id="data-preloaded" data-preloaded="{&quot;topic_list&quot;:&quot;{\&quot;users\&quot;...\&quot;:false,\&quot;allowed_iframes\&quot;:\&quot;https://dietpi.com/forum/discobot/certificate.svg\

Needed to truncate it as it vastly exceeds the character limit for a post. The document usually continues like this:

      <discourse-assets-json>
        <div class="hidden" id="data-preloaded" data-preloaded="{&quot;topic_list&quot;:&quot;{\&quot;users\&quot;...\&quot;:false,\&quot;allowed_iframes\&quot;:\&quot;https://dietpi.com/forum/discobot/certificate.svg\&quot;,\&quot;can_permanently_delete...

The other pages hang at the very same point. I think we are one something here.

EDIT: Ah wait, I checked wrong, other pages hang elsewhere. So it is not this particular HTML element/attribute.

Yeah each page/HTML document hangs at the very same character when loaded via browser again and again, in private window etc. But a different page hangs at a different point. And when loading those via curl, they as well hang always at the very same point, but a different one, and wget again always hangs at a same point, but a slightly different one. Very weird.

Do you have some optimization enabled?

Nope, no (content) optimizations. I did have the 103 Early Hints feature enabled, but disabled it already as attempt to solve things. Tried the same with the protocol settings, but that did not change anything eiher:

Btw, there is no content-length response header, could that cause an issue? I mean it does not exist when bypassing Cloudflare either, but probably Cloudflare then has some issue? EDIT: No seems to be normal for dynamic pages, same with our Wordpress and Matomo pages which however do not cause any issues.

And another find when playing with curl. Printing to STDOUT results in full HTML document being displayed, but it still hangs and at the end:

  <p class='powered-by-link'>Powered by <a href="https://www.discourse.org">Discourse</a>, best viewed with JavaScript enabled</p>
</footer>



  </body>

</html>

But when trying to save it via -o or simple redirect, or even just piping into grep, it hangs at a different place:

            <div class="link-bottom-line">
                <a href='/forum/c/general-discussion/7' class='badge-wrapper bullet'>
                  <span class='badge-category-bg' style='background-color: #F7941D'></span>
                  <span class='badge-category clear-badge'>
                    <span class='category-name'>General Discussion</span>
                  </span>
                </a>

And I can 100% replicate this very same 73728 bytes when accessing https://www.dietpi.com/forum/ with curl without just printing it to console right away. This is so weird :face_with_monocle:.


So:

  • All clients hang at loading any HTML document from our Discourse instance.
  • Each client hangs at the very same byte when loading the same page.
  • Different clients hang at different points, but at the very same byte when repeating with the same client.
  • Each page hangs at a different point in the document and at different downloaded size.
  • The same tool curl hangs at different points when just printing to STDOUT vs piping or storing the document somewhere.
  • wget is able to download the full document (at least https://www.dietpi.com/forum/) to a file, but still hangs at the end, same when curl https://www.dietpi.com/forum/ prints the full document to console, but hangs at the end.

I think that might be buffering. But when investigating, I noticed something else.

wget -O - https://www.dietpi.com/forum/latest

Ends with

  </body>
</html>

But the connection is never closed.

Theory: there is a configuration issue somewhere where there is a mismatch in HTTP versions or headers (like keep alive connection) and this only becomes a problem when a document is larger than X (I suspect 64KB).

Yes, wget always downloads the whole document, and curl does so when printing directly to console, but connection is not closed. Same happens with much smaller documents, like I tested a 14k one of a topic with only 2 posts. But even the smaller ones usually are not fully downloaded by curl when piping or storing to file, neither in browser.

Both tools always show HTTP/2, and in Cloudflare I have HTTP/2 origin requests enabled. But worth to test using other HTTP versions explicitly. Yesterday I disabled all protocol settings in Cloudflare seen on the screenshot above, and it did not help. But I’ll try again. Also I can enable access logs on the server to see the actually incoming request from Cloudflare.

I tried all combinations of supported HTTP (1.1-3) and TLS (1.2-1.3) versions, but that does not make a difference. I also disabled HTTP3 support, HTTP2 origin requests this 0-RTT connection resumption again. No difference, curl keeps hanging at exactly the same 73,728 bytes of https://www.dietpi.com/forum/.

Regarding the theory of too large document sizes, https://www.dietpi.com/dietpi-software.html has 199,475 bytes and loads perfectly fine. I should mention that the server (same webserver) hosts a static website, MkDocs instance, Wordpress, Matomo, which all work perfectly fine. Also there is a Grafana instance where the front webserver acts as proxy via UNIX socket.

But I agree it seems to be related to buffers or chunk sizes or something like that. It is just weird that the downloaded size until hang varies so much between clients and pages, while it remains exactly the same despite changing protocol versions, and that the connection is not even closed when the document has been fully downloaded. Like if the stop signal is missing, though I am missing insights into HTTP at this point. Hence I thought about content-length header, but that one is obviously not mandatory.

The webserver also acts as proxy for the Discourse container via UNIX socket. I could enable the TCP listener to make the Discourse instance additionally available without the proxy (leaving the Nginx within the container, of course).

Could you try KeepAlive Off in Apache?

I guess that would at least possibly rule out the webserver, so that would be worth a try.

1 Like

No change. Also from Apache docs:

In addition, a Keep-Alive connection with an HTTP/1.0 client can only be used when the length of the content is known in advance.

Hence as of missing content-length probably it makes sense that it is not used anyway for this request.

Since it requires a rebuild, I’ll do a little later when our common website activity is at a minimum. Um, I am just thinking about HTTPS … looks like I need to do some custom adjustments to the internal Nginx config to keep the UNIX socket functional as well as plain HTTP connections, while listening on an additional port for HTTPS with the TLS certificates from the host, but without HTTPS redirect/enforcement. … and additional plain HTTP TCP port would be also interesting, for clients which can ignore HSTS.

Are you by chance using RocketLoader in CloudFlare? I know with some other scripts it causes issues.
Also, did you clear the CF cache?
Are you using inbound rules on CF that may have been tied to your old VPS IP address and not updated to the new one?

1 Like

No RocketLoader: Note that as of above tests with curl and wget, which do not interpret any syntax, hence do not load any JavaScript or styles or anything else, the problem is that the download of the raw HTML document always hangs.

Cloudflare cache is not active for the forum, the raw HTML documents were never cached anyway.

No VPS-specific rules. Generally not rules for the forum, aside of to bypass cache. The issue appears in both cases, so cache isn’t the issue either.

1 Like

While testing to bypass the Apache2 proxy at the Discourse container host, and disabling forced HTTPS redirects at Cloudflare to test plain HTTP connections via curl as well, I finally found the culprit at Cloudflare:

I am not sure what changed with our VPS switch and/or Discourse 3.5.0.beta3 to 3.5.0.beta4 upgrade and/or coincidentally at Discourse at the same time, but it seems that something in the Discourse HTML, CSS or JavaScript documents causes Cloudflare’s HTTPS rewrite of embedded URLs to choke. Looks like the partial and hanging curl requests were not really related, or maybe they are. Weird that in the browser network tab one can see the partial content of HTML document, as if the HTTPS rewrite feature does it while streaming through the document.

Does maybe someone else have an instance and a Cloudflare account to test this with, whether it is a general issue or related to our particular instance/setup?

Btw, to test bypassing the proxy as well as HTTP, while keeping the connection via proxy active, manually adjusting the Nginx config within the container like this works perfectly fine:

root@dietpi-discourse:/var/www/discourse# cat /etc/nginx/conf.d/outlets/server/10-http.conf
listen unix:/shared/nginx.http.sock;
set_real_ip_from unix:;
listen 8080;
listen [::]:8080;
listen 8443 ssl;
listen [::]:8443 ssl;
http2 on;

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;

ssl_certificate /shared/fullchain.cer;
ssl_certificate_key /shared/dietpi.com.key;

ssl_session_tickets off;
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:1m;

Important to remove HTTPS redirects and HSTS header, of course, and to expose the added ports.

And another find: We use mod_sed to add our Matomo tracker code to all text/html responses, right in front of the </head> end tag. Disabling it for Discourse (or bypassing the Apache2 proxy) solves things as well, despite Cloudflare Automatic HTTPS Rewrites active. To disabling either of both solves things. On all other pages the combination works fine, also on very large pages we have, larger than failing forum pages. So maybe the two filters, first mod_set on our proxy and then the embedded URL rewrites by Cloudflare cause something to break, related to document or chunk sizes or whatever.

We embed the tracker via Discourse theme edit now, and I additionally disabled the Cloudflare Automatic HTTPS Rewrites. There is no mixed content on our whole website. And if there is, good to see and fix it, instead of having Cloudflare masking it forever.

I’m pretty sure that can’t work.

I’m not quite sure what problem you’re trying to solve, but you probably need to enable force_https in your app.yml.

3 Likes

I imagine just from the name “Cloudflare Automatic HTTPS Rewrites” it can be misunderstood. Cloudflare has 2 features:

  • “Always Use HTTPS” redirects all plain HTTP requests to HTTPS, just like force_https in Discourse does. Both were previously enabled, and I disabled both to test whether HTTPS has anything to do with the issue or the endless loading Discourse pages and hanging curl requests. This worked perfectly fine, even solving the whole issue for HTTPS requests as well, but just because I disabled “Cloudflare Automatic HTTPS Rewrites” in the same turn.
  • “Cloudflare Automatic HTTPS Rewrites” alters HTML, CSS and JavaScript documents to replace all embedded plain HTTP URLs with HTTPS variants, where Cloudflare thinks the host is reachable via HTTPS (based on HSTS preload list and such). This is to avoid mixed content warnings.

Enforcing or not enforcing HTTPS at Cloudflare, at the host proxy or at Discourse does not matter. What did cause the issue is the combination of mod_sed filter at the host proxy and embedded plain HTTP edits by Cloudflare. So two stages at which the content of the documents was passed through a filter. Problem was not that there was any actual content change (there is no mixed content at our site, “Cloudflare Automatic HTTPS Rewrites” hence does not actually change the document body), but probably related to chunks, buffer or timing.