Discourse update keeps failing

The core dump and the invalid instruction indicate that something is going wrong at a low level (CPU, memory).

I am not a hardware expert, but this CPU came on the market 12 years ago and I suspect it might be too old (i.e. it is trying to run compiled code that assumes a newer CPU).

We did think about this, but given it has been working fine for the last three years what would have been updated within the stack that suddenly requires a newer instruction? (Also, what/which instruction?)

Would FEATURE: Add support for clear_every parameter in Redis backend (#309) · discourse/message_bus@1baa1ea · GitHub be triggering some different behaviour within Redis? :thinking:

I also want to add that last Friday the major version upgrade was performed seamlessly and it ran the entire weekend without a hitch. I even performed a successful update on Sunday. If it is the CPU, which is understandable, is the cause then it would’ve shown this error with the major version upgrade.

But, perhaps there has been a change since Monday…


That could very well be, it’s crashing in a json parse routine, in the message bus code, although that change you mentioned is over 4 months old.

Yeah… so it should already have been present on Sunday. :pensive:

Looking through the logs, it seems like there is already some other instance of redis running when it tries to start it.
Can that be the issue?

This is pretty normal for a launcher rebuild app - it doesn’t affect anything (as far as I know, at least…).


Code paths can also trigger on certain data being present or absent. Maybe the offending code was present but it was not being executed.


I’m going to try some quasi-bisecting on the latest set of commits and see if I can narrow it down to a specific recent change. This will take “some time”… :sweat_smile:


OK, so the first bad commit with the illegal instruction is Build(deps): Bump oj from 3.13.14 to 3.13.15 (#17309) · discourse/discourse@4c69619 · GitHub which is linked to Fix NaN object dump issue · ohler55/oj@f0122cf · GitHub

Some previous commits also fail to build but with a different issue (which also seems like it could be transient…):

Good find, it’s indeed crashing in the oj gem.

Version 3.13.15 also contains this commit which switches to using SSE 4.2 instructions for performance. And those are not supported on AMD Opteron 41xx processors.

So we’re back to

IMHO It sucks that the gem author chose to make this a compile time decision.


Lovely. An additional change not mentioned in the oj changelog… :grin:

So, if the gem doesn’t do its native compilation during installation (so we could potentially prod it into working via OJ_USE_SSE4_2), it looks like it’s going to need a server move… :expressionless:

Edit: the gem doesn’t distribute any pre-compiled objects so this should be workable - so the next question is why it’s compiling with SSE4.2 on a system that doesn’t support it.


Our current base image ships 3.13.14 so it is being compiled on your system.

Can you try reproducing the error with the benchmark script from the commit:

○ → docker run --rm -it -u discourse discourse/base:2.0.20220621-0049 bash
discourse@313d7af3be39:/$ cd
discourse@313d7af3be39:~$ gem install --user pry benchmark-ips oj
Successfully installed oj-3.13.15
5 gems installed
discourse@313d7af3be39:~$ /home/discourse/.local/share/gem/ruby/2.7.0/bin/pry
[1] pry(main)> require 'benchmark/ips'
require 'oj'

def json(string)

Benchmark.ips do |x|
  x.warmup = 5
  x.time = 20

  json_0   = json('a' *   0)
  json_64  = json('a' *  64)
  json_128 = json('a' * 128)

  x.report('Oj.load   [0]') { Oj.load(json_0) }
  x.report('Oj.load  [64]') { Oj.load(json_64) }
  x.report('Oj.load [128]') { Oj.load(json_128) }

You can also check whether or not it was compiled using the problematic instruction with:

If so, this is probably something to report to the oj gem’s project.


I do want to look into this some more, but 1) I want to avoid more downtime (for a while at least; I know the above doesn’t involve downtime but I might be tempted to try other things) and 2) when this changes:

to 3.13.15 and the Discourse base image inherits that same minimum CPU microarchitecture requirement, then the current server isn’t going to be sustainable anyway (unless there’s a way of working around it, like (re)installing the gem separately e.g. as part a pre-code hook, but I’d also guess that’s a bit of a faff for most people).

It also raises the question of what a reasonable cut-off date for hardware support should be anyway; it’s not reasonable to expect 32-bit CPU support, so perhaps SSE4.2 is a reasonable “new minimum” for modern software.


Indeed, I’ve already raised this internally.



Hey !

Thank you for looking in to this. I am having the same issue on an Intel Atom N2800 (from end of 2011).
Do you think there might be a way around this issue or the only thing I can do for now is to migrate to a newer hardware ?

Thank you,

I’m dead in the water now with my forum with the update I was prompted to do today. I never saw any warnings about upcoming obsoleting of any CPUs, and to have this happen suddenly is … bad. The available servers all are the same configuration for consistency, and all use the same CPU.

AMD Athlon™ II X2 B22 Processor

Not practical to run out and buy a new server, configure, etc. in this economy, even given the time.

How can I back out of this update until this situation is better understood? I can’t even contact my users right now with the forum down. Thanks.

If you’re using the Docker deployment method, you may have an older container which you can restart (check e.g. docker images and/or docker ps -a).

You can also override the commit used to build the Discourse instance by editing app.yml and setting the version to the commit prior to the change, then rebuilding:

  version: adb7fa5e2fc51308efc9fc4ee57ecb1c15a85cfa

Discourse will break again if you update after this, which is not ideal given the security update that has been released since (although exploitation potential seems pretty limited for most instances).


One option (which I haven’t tried yet) is to install the oj gem separately and hope to trigger compilation with the correct CPU features (or lack thereof).

I had planned to try this in app.yml:

    - exec:
          - gem install oj

but I haven’t got the scope for more forum downtime.


That specific security update doesn't appear relevant to me since I'm not in a shared hosting environment. I'm unsure how to interpret the docker info. Here's the ps:

37c258b23221 local_discourse/app "/sbin/boot" 3 months ago Exited (7) 3 hours ago

37c258b23221 local_discourse/app “/sbin/boot” 3 months ago Exited (7) 3 hours ago

Can you try a ./launcher start app ?