AWS installation stuck in Read only mode

Hello,

We’ve had our internal Discourse site for around 3 months now and suddenly it has gone into read only mode.

Based on other topics (Stuck in 'Read Only' Mode) I have already tried the following:

  • Login to docker instance: ./launcher enter app
  • Login to rails: rails c
  • Disable read only: Discourse.disable_readonly_mode(Discourse::USER_READONLY_MODE_KEY)
  • Quit rails: quit
  • Exit container: exit

After doing this, our application seems to come out of read only mode, then goes back to read only mode.

I tried to rebuild a container but now I am getting the following error:

"Caused by:
PG::ReadOnlySqlTransaction: ERROR:  cannot execute ALTER TABLE in a read-only transaction
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/rack-mini-profiler-1.0.0/lib/patches/db/pg.rb:92:in `async_exec'
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/rack-mini-profiler-1.0.0/lib/patches/db/pg.rb:92:in `async_exec'"

I guess because it’s stuck in read only mode for some reason?

Current setup is:

  • 4 containers (2 cannot rebuild based on above, 2 are still running from 2 weeks ago)
  • AWS
  • Elasticache Redis
  • RDS PostgreSQL

Regards,
amoncadot

Hello,

Just followed the steps via Inheriting discourse install - need some assistance and still no luck.

I have two containers still running while the other two are offline.

My worry is that if I stop those two containers then I won’t be able to log back into Discourse.

I’d like to fix this while those two containers are still running.

Kind regards,
amoncadot

Can you check whether you’re out of disk space?

Hi,

The web servers are not out of disk space and RDS is fine also.

Regards,
amoncadot

1 Like

It says that Redis is in a read only state:

I, [2018-06-19T18:01:55.777804 #13]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'

Preformatted textNo connection to db, unable to retrieve site settings! (normal when running db:create) WARN: Redis is in a readonly state.' Performed a noop Failed to report error: Connection lost (ECONNRESET) 2 Dropping undeliverable message: ERR Error running script (call to f_b06356ba4628144e123b652c99605b873107c9be): @user_script:14: @user_script: 14: -READONLY You can't write against a read only slave.

I have rebooted my Elasticache and failed over… yet Redis remains in a read only state… any ideas?

Is your ElastiCache setup Multi-AZ? That message suggests you are connecting to a secondary node in a cluster. Double check that the hostname you are using is the Primary Endpoint of the cluster.

5 Likes

Yes ElastiCache is Multi-AZ. Setup is:

I’ve just tried to use an entirely different Redis cache and again the build failed.

My app.yml specifies the primary endpoint of the cluster.

So far I have no containers running, just postgreSQL and redis cache.

I am going to try and use a snapshot of RDS this morning with a new Redis cache but if that fails I am not sure what else I can do since I have no containers running to access the UI.

Why did Discourse suddenly go into read only mode without manual intervention?

The site can switch to read-only mode when databases return errors like “Redis is in a readonly state”.

There are multiple types of READONLY depending on the trigger. Discourse.disable_readonly_mode(Discourse::USER_READONLY_MODE_KEY) will only turn off one, you have to pass the other keys to turn off the other types.

2 Likes

Ah okay.

When I had containers running, I disabled all three modes/keys you listed and it temporarily removed read only access and then returned back immediately. Hence why I have now moved onto trying to rebuild a different cache.

I had seen that the keys solution had worked for other people but for some reason it did not work for our application.

Note that this is wildly on the enterprise side of complex setups, so there’s a limited amount we can help here.

4 Likes

Hi Jeff,

Thanks for the compliment. It’s nice to hear that the co-founder of Discourse/Stackoverflow considers our environment as enterprise even before it has been released :wink:

Problem solved. The issue was that Amazon Aurora was being used as RDS and by default this creates a cluster with two database instances inside - one primary and one replica.

Sometime yesterday an auto failover occurred and within our app.yml under the DISCOURSE_DB_HOST: parameter I had specified the DATABASE endpoint. Not the CLUSTER endpoint. The failover made the database endpoint specified in app.yml a read only replica, thus Discourse being locked into Read Only mode.

If anyone is running a similar setup:

  • EC2 instances with docker containers
  • Redis ElastiCache
  • Amazon Aurora RDS (PostgreSQL underneath)

Check that /var/discourse/containers/app.yml contains:

DISCOURSE_DB_HOST: RDS Cluster Endpoint (Go to RDS > Clusters > Access your cluster > Under Cluster endpoint is the endpoint you need to specify)
DISCOURSE_DB_PORT: RDS Cluster Port
DISCOURSE_REDIS_HOST: ElastiCache Primary Endpoint (Go to ElastiCache > Redis > Toggle the “play” shape button beside your Redis Cluster Name > Under Primary Endpoint is the endpoint you need to specify)
DISCOURSE_REDIS_PORT: Redis Cluster Port

Hope this helps someone!

@codinghorror Is there a way to run ./launcher rebuild app without pulling down the latest codebase from Discourse?

You can pin the build number in your yml

Hi Bhanu,

Do you have a demonstration on how to do this?

Yes,

Enable and use the version directive in your yml! it’s disabled by default and pinned to tests passed.

Hi Bhanu,

That “tests-passed” refers to a specific branch on Git correct? As I can see other branches such as stable where there are still commits being issued quite frequently.

Is there anyway we can remain on a specific version of Discourse even when we run a container rebuild? We want to be able to control our main Discourse codebase and update/test it within a release environment before deploying to prod.

tests-passed will be the branch that is the most recent known working instance, not necessarily stable.

I would recomment you pin your dead containers to the version your active containers are on (you can probably see the version in your docker-manager) and then make these active first.

This is exactly why you’d pin your yml to only fetch a specific version, not tests-passed or stable.

For the record, the version value is passed directly to a git checkout, so it can be any commit hash, branch or tag.

https://github.com/discourse/discourse_docker/blob/master/templates/web.template.yml#L115-L116

Edit: No onebox :thinking:

2 Likes

Should be fixed now.

5 Likes

Thanks.

So to build a specific version of Discourse, all I’ll need to do is specify a commit hash as the version value?

Example of web.template.yml:

params:
  # Building from branch "stable" with latest commit
  version: 849b4b56853756a24f0646c04e733e5af7cc2a2b

This will then be picked up by:

- git fetch origin $version
        - git checkout $version

Is this correct?

That should work, yes.

2 Likes