Discourse update keeps failing

Our current base image ships 3.13.14 so it is being compiled on your system.

Can you try reproducing the error with the benchmark script from the commit:

○ → docker run --rm -it -u discourse discourse/base:2.0.20220621-0049 bash
discourse@313d7af3be39:/$ cd
discourse@313d7af3be39:~$ gem install --user pry benchmark-ips oj
…
Successfully installed oj-3.13.15
5 gems installed
discourse@313d7af3be39:~$ /home/discourse/.local/share/gem/ruby/2.7.0/bin/pry
[1] pry(main)> require 'benchmark/ips'
require 'oj'

def json(string)
  "\"#{string}\""
end

Benchmark.ips do |x|
  x.warmup = 5
  x.time = 20

  json_0   = json('a' *   0)
  json_64  = json('a' *  64)
  json_128 = json('a' * 128)

  x.report('Oj.load   [0]') { Oj.load(json_0) }
  x.report('Oj.load  [64]') { Oj.load(json_64) }
  x.report('Oj.load [128]') { Oj.load(json_128) }
end;

You can also check whether or not it was compiled using the problematic instruction with:

discourse@313d7af3be39:~$ objdump -d /home/discourse/.local/share/gem/ruby/2.7.0/gems/oj-3.13.15/lib/oj/oj.so | grep -C3 pcmpestri
   2e32b:	0f 82 b5 03 00 00    	jb     2e6e6 <oj_parse2+0x8a6>
   2e331:	66 0f 6f 05 77 d6 01 	movdqa 0x1d677(%rip),%xmm0        # 4b9b0 <exp_plus+0x330>
   2e338:	00 
   2e339:	66 0f 3a 61 07 00    	pcmpestri $0x0,(%rdi),%xmm0
   2e33f:	83 f9 10             	cmp    $0x10,%ecx
   2e342:	74 dc                	je     2e320 <oj_parse2+0x4e0>
   2e344:	48 63 c9             	movslq %ecx,%rcx

If so, this is probably something to report to the oj gem’s project.

3 Likes

I do want to look into this some more, but 1) I want to avoid more downtime (for a while at least; I know the above doesn’t involve downtime but I might be tempted to try other things) and 2) when this changes:

to 3.13.15 and the Discourse base image inherits that same minimum CPU microarchitecture requirement, then the current server isn’t going to be sustainable anyway (unless there’s a way of working around it, like (re)installing the gem separately e.g. as part a pre-code hook, but I’d also guess that’s a bit of a faff for most people).

It also raises the question of what a reasonable cut-off date for hardware support should be anyway; it’s not reasonable to expect 32-bit CPU support, so perhaps SSE4.2 is a reasonable “new minimum” for modern software.

5 Likes

Indeed, I’ve already raised this internally.

:+1:

4 Likes

Hey !

Thank you for looking in to this. I am having the same issue on an Intel Atom N2800 (from end of 2011).
Do you think there might be a way around this issue or the only thing I can do for now is to migrate to a newer hardware ?

Thank you,

I’m dead in the water now with my forum with the update I was prompted to do today. I never saw any warnings about upcoming obsoleting of any CPUs, and to have this happen suddenly is … bad. The available servers all are the same configuration for consistency, and all use the same CPU.

AMD Athlon™ II X2 B22 Processor

Not practical to run out and buy a new server, configure, etc. in this economy, even given the time.

How can I back out of this update until this situation is better understood? I can’t even contact my users right now with the forum down. Thanks.

1 Like

If you’re using the Docker deployment method, you may have an older container which you can restart (check e.g. docker images and/or docker ps -a).

You can also override the commit used to build the Discourse instance by editing app.yml and setting the version to the commit prior to the change, then rebuilding:

params:
  version: adb7fa5e2fc51308efc9fc4ee57ecb1c15a85cfa

Discourse will break again if you update after this, which is not ideal given the security update that has been released since (although exploitation potential seems pretty limited for most instances).

3 Likes

One option (which I haven’t tried yet) is to install the oj gem separately and hope to trigger compilation with the correct CPU features (or lack thereof).

I had planned to try this in app.yml:

hooks:
  before_code:
    - exec:
        cmd:
          - gem install oj

but I haven’t got the scope for more forum downtime.

3 Likes

That specific security update doesn’t appear relevant to me since I’m not in a shared hosting environment. I’m unsure how to interpret the docker info. Here’s the ps:

37c258b23221 local_discourse/app “/sbin/boot” 3 months ago Exited (7) 3 hours ago

Here’s the image list:

REPOSITORY            TAG                 IMAGE ID       CREATED         SIZE
discourse/base        2.0.20220621-0049   a44ca4f67972   3 weeks ago     2.65GB
local_discourse/app   latest              b5f2a8a39709   3 months ago    3.53GB
discourse/base        2.0.20220413-0411   ab71a5d97460   3 months ago    2.81GB
<none>                <none>              58ba7d1c8d7a   3 months ago    3.74GB
discourse/base        2.0.20220224-2005   cd112601450a   4 months ago    2.84GB
<none>                <none>              d9cf1feb92fd   6 months ago    3.19GB
<none>                <none>              d53ee33f6fe1   6 months ago    3.19GB
<none>                <none>              14f79500c49c   6 months ago    3.19GB
<none>                <none>              edff9b614f46   6 months ago    3.19GB
<none>                <none>              e2348b41f937   6 months ago    3.19GB
<none>                <none>              42f6511b414c   6 months ago    3.19GB
<none>                <none>              3086f92af2fe   6 months ago    3.19GB
<none>                <none>              6ada029723ba   6 months ago    3.19GB
<none>                <none>              ca61149580d4   6 months ago    3.19GB
<none>                <none>              ce5ae3bb62ac   6 months ago    3.19GB
<none>                <none>              e9a5c1b1aed4   6 months ago    3.19GB
<none>                <none>              6bb94ce1e01f   6 months ago    3.19GB
<none>                <none>              e1df4acbd927   6 months ago    3.19GB
<none>                <none>              7e05a0b160c5   6 months ago    3.19GB
<none>                <none>              979926f28a73   6 months ago    3.19GB
<none>                <none>              d055f9b01556   6 months ago    3.19GB
<none>                <none>              aa0c779093dc   6 months ago    3.19GB
discourse/base        2.0.20211118-0105   b6cc7cf8974a   7 months ago    2.58GB
discourse/base        2.0.20210528-1735   482386bf57af   13 months ago   2.36GB
<none>                <none>              e6011d2b206c   14 months ago   2.69GB
discourse/base        2.0.20210415-1332   30e4746e631e   15 months ago   2.23GB
<none>                <none>              8066ac13b8c3   17 months ago   2.45GB
discourse/base        2.0.20201221-2020   c0704d4ce2b4   18 months ago   2.11GB
<none>                <none>              043da6b3335d   2 years ago     2.4GB
discourse/base        2.0.20200429-2110   dc919e1dae2c   2 years ago     2.13GB
<none>                <none>              ff15472f4794   2 years ago     2.79GB
discourse/base        2.0.20191013-2320   09725007dc9e   2 years ago     2.3GB
<none>                <none>              f65391a062f0   2 years ago     2.62GB
discourse/base        2.0.20190901-2315   10f636afbeaf   2 years ago     2.29GB
<none>                <none>              6944d06786b4   2 years ago     2.31GB
discourse/base        2.0.20190625-0946   2b3a5b47565f   3 years ago     1.93GB
<none>                <none>              60b39deba7d2   3 years ago     2.3GB
discourse/base        2.0.20190505-2322   ed87227f60d2   3 years ago     1.91GB
<none>                <none>              cc5c0e56298c   3 years ago     2.38GB
discourse/base        2.0.20190321-0122   7db99586b5b5   3 years ago     1.97GB
<none>                <none>              b19f9a483788   3 years ago     2.27GB
discourse/base        2.0.20190217        9c24db193c37   3 years ago     1.92GB
hello-world           latest              fce289e99eb9   3 years ago     1.84kB
<none>                <none>              614db6988e9c   3 years ago     2.25GB
<none>                <none>              729b196da862   3 years ago     2.25GB
<none>                <none>              80584ec5ec01   3 years ago     2.25GB
<none>                <none>              0e2481aefed8   3 years ago     2.25GB
<none>                <none>              725d0c17a6bb   3 years ago     2.25GB
<none>                <none>              220bed95d236   3 years ago     2.25GB
<none>                <none>              fca469dba597   3 years ago     2.25GB
<none>                <none>              edab31d0ffce   3 years ago     2.25GB
<none>                <none>              dbacaff2d35e   3 years ago     2.25GB
<none>                <none>              3d6a0453da1d   3 years ago     2.25GB
<none>                <none>              fbf0529eb303   3 years ago     2.25GB
<none>                <none>              7a45443ae44c   3 years ago     2.25GB
<none>                <none>              ad90d7f42416   3 years ago     2.25GB
<none>                <none>              d61ea07d6084   3 years ago     2.25GB
<none>                <none>              d393fd8b4de0   3 years ago     2.25GB
discourse/base        2.0.20181031        ea31cd77735a   3 years ago     1.88GB


Can you try a ./launcher start app ?

3 Likes

This one is likely to work. You can start that image with e.g. docker start b5f2a8a39709 .

(You might also want to trim some of those older images - there’s potentially a large chunk of disk space that can be recovered!)

2 Likes

Getting: Error response from daemon: No such container: b5f2a8a39709

Thanks. Also, my backup procedures copy ALL files from the system. There are likely more recent images there if I knew where to look and where to copy them.

My apologies for interrupting the workaround, but we are going to migrate to another server, which was a challenge on its own because it was a dedicated server and we just renewed the contract for an entire year last June.

Perhaps it would be nice if the Discourse team issues a warning for people who run it on servers that aren’t supported anymore. To find it out the way we did is VERY unpleasant. (three users with the same issue, we’re talking about servers, they don’t get renewed at the same speed as Laptops do.)

1 Like

I want it to be clear this was not an intentional change.

We also do not have direct access to hardware this old and need to rely on some community assistance here to help determine what exactly is going wrong.

Once we know for sure this is a compilation problem with the gem itself, we can take action.

3 Likes

@here

Adding a top level key on the app.yml file with

base_image: discourse/base:2.0.20220621-0049-slim

Should work around the issue, albeit will slow down rebuilds a bit.

3 Likes

That’s fair, but such servers still are being offered by providers around the world as low-entry servers.
For a lot of smaller Open source projects, such servers are ideal, price-wise and often they cannot afford an Intel Xeon or AMD Ryzen with 32 GB RAM.

I completely understand that you don’t have the hardware to test the software on, but from the communication in this thread, it was established by us and then there wasn’t any reaction at all.
A simple sorry, we are going to look into this would be sufficient in this case, instead, you let us hang there.

1 Like

Testing now with this change.

Build appears to fail the same way.

This was with the change to containers/app.yml, adding:

base_image: discourse/base:2.0.20220621-0049-slim

near the top.

1 Like

That means the issue is not that we ship a pre-compiled version of the gem, but that the upstream gem can’t compile on those old CPUs.

We have raised issue #789 against the oj gem.

2 Likes

Understood. I’d like to restore one of my recent docker images – from my rsync backups. Is there a procedure you can point me at to locate these and restore/start one? Thanks!

Have you tried a ./launcher start app ?

1 Like

If this one doesn’t work, try the other method I detailed for rebuilding from the last-working commit.

3 Likes