An ordeal migrating from Scaleway to a Raspberry Pi 4

Here the documentation of my mishaps and my eventual success getting a discourse instance migrated from scaleway to a raspberry pi 4, with cloudflare infront.

Created a backup from the scaleway discourse instance, and ./launcher stop app, and shutted down the machine.

Installed Ubuntu Server 23.10 on a usb connected sata ssd powering the Raspberry Pi 4
Installed LXD, created a 100GiB btfs loopback storage pool.
Update the default profile to:

  cloud-init.user-data: |
    ssh_pwauth: false
    package_update: true
    package_upgrade: true
      - openssh-server
      - vim
      - git
      - rsync
      - name: root
        lock_passwd: true
        ssh_import_id: gh:balupton
description: Default LXD profile
    name: eth0
    network: lxdbr0
    type: nic
    path: /
    pool: default
    type: disk
name: default

Add a discourse profile with:

  limits.memory: 1GiB
  limits.memory.enforce: soft
  security.nesting: 'true'
description: Configuration for Discourse instances
devices: {}
name: discourse

Created a Ubuntu 23.10 Minimal Server image with those profiles. Accessed it via the following in my ~/.ssh/config:

	ProxyJump LXD_HOST
	IdentityFile ~/.ssh/

Followed the discourse cloud install instructions and restored my discourse configuration from the scaleway instance:

  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/cloudflare.template.yml"
  - "templates/web.ssl.template.yml"  # https
  - "templates/web.letsencrypt.ssl.template.yml"  # https
  # - "templates/web.ratelimited.template.yml" # not needed with cloudflare

  - "80:80"
  - "443:443"  # https

  db_default_text_search_config: "pg_catalog.english"

  LANG: en_US.UTF-8

  ## HTTPS configuration for: templates/web.letsencrypt.ssl.template.yml
  LETSENCRYPT_ACCOUNT_EMAIL: "redacted"  # https

  ## The domain name this Discourse instance will respond to

  ## List of comma delimited emails that will be made admin and developer
  ## on initial signup example ','

  ## The mailserver this Discourse instance will use
  #DISCOURSE_SMTP_DOMAIN:    # (required by some providers)
  #DISCOURSE_NOTIFICATION_EMAIL:    # (address to send notifications from)

## Any custom commands to run after building
  - exec: rails r "SiteSetting.contact_email='redacted'"
  - exec: rails r "SiteSetting.notification_email='redacted'"

## The Docker container is stateless; all data is stored in /shared
  - volume:
      host: /var/discourse/shared/standalone
      guest: /shared
  - volume:
      host: /var/discourse/shared/standalone/log/var-log
      guest: /var/log

## Plugins
    - exec:
        cd: $home/plugins
          - git clone
          - git clone
          - git clone
          - git clone
          - git clone
          - git clone
          - git clone
          - git clone
          - git clone
          # - git clone
          # - git clone
          # - git clone

Before I could restore the backup however, I needed to rebuild it. Unfortunately, the btrfs storage pool would hang on the yarn install step for hours, and eventually time out, with next to zero load on the machine.

Doing some reading, I then decided to try use a zfs storage pool instead, this would then get further, but would still hang indefinitely after Background saving terminated with sucess, with next to zero load on the machine.

(I have screenshots however uploading them here fails.)

I then decided to abandon LXD, and try it directly on the Ubuntu Server isntance on the Raspbbery Pi 4.
For the first time I had a successful rebuild, however all attempts to access it would redirect to itself, in a redirect loop.

The redirect loop had two causes…

If I had the following in my discourse configuration:

  - "8080:80"
  - "8081:443"  # https

It would redirect endlessly, always wanting to redirect to https://hostname.
Solving this was going back to:

  - "80:80"
  - "443:443"  # https

Secondly, anything accessed via the cloudflare tunnel would also redirect endlessly to itself. The cause it turns out was having a tunnel for both HTTP and a tunnel for HTTPS. Changing the tunnel to only HTTPS solved it.

Other things I did but at this point am unsure if it mattered:

  1. I removed letsencrypt as used a Cloudflare Origin Certificate instead.
  2. I’ve configured the Origin Sever Name in the HTTPS tunnel to be the intended hostname.

Things that could be improved:

  1. HTTPS from the Origin to Cloudflare could be avoided if I lock down the machine to only allow connections from Cloudfare, and setup a SSH tunnel. However, I’m not sure if Discourse runs better having HTTPS on itself (e.g. http2, etc).
  2. Whether or not letsencrypt works with the cloudflare tunnel (I was unable to test it as when I was using letsencrypt I was getting the redirect loops).

How I debugged the redirect loops:

  • For debugging the discourse redirect loop: I set /etc/hosts to point my discourse hostname directly to the IP address, then used curl -k --head 'https://hostname:8081 etc to test it
  • For debugging the cloudflare tunnel redirect loop: I removed that from /etc/hosts so the hostname is resolving via DNS, then used curl -k --head 'https://hostname etc to test it.

There are a bunch of other nifty thigns and learnings along the way, however that can wait.

Feedback for discourse:

  • Rebuilding needs to be more clear on what it is doing. Too often there are long delays with no obvious action being peformed.
  • Find out why exposing different ports would cause a redirect loop
  • Since letsecnrypt became a thing, documentation on how to specify one’s own SSL certicate is tedious to uncover. Also its seems only one certificate can be used as it is fixed to /var/discourse/shared/standalone/ssl/ssl.key instead of say /var/discourse/shared/standalone/ssl/CONTAINER_ID.key, e.g. /var/discourse/shared/standalone/ssl/app.key — cloudflare provides origin certs so this is a good option for cloudflare users
  • Publishing a comprehensive step by step all-inclusive guide for cloudflare + raspberry pi 4 would have helped enourmously, currently such guides delegate too much information to third parties that have no awareness of each other, and all the compexity and debugging is in how the different parts work together, not how they work alone

Other someday todos:

  • Found out why it would hang in LXD
  • See if it works in LXD on a Raspberry Pi 5, or on Multipass on macOS, or LXD with the storage pool being a partition/drive isntead a loopback file: as then I don’t have to waste an entire machine on it
  • See if I can have docker and launcher not need sudo