Cannot access topics after upgrade from 1.6.0.beta1 to beta2

I’ve upgraded discourse using the UI, running on a docker image on Ubuntu 14.4. The entry page as well as the admin pages work, even the search is working. But all the topics are no longer accessible and it seems to load forever then at some point times out.

I’ve then rebuild the app via the docker commands, but it did not improve. Then I upgraded the machine itself via apt-get and rebuild the app again. But still it has the same symptoms.

What I find a bit strange that the version says: installed: v1.6.0.beta1 +281 and latest: 1.6.0.beta2 and shows You're up to date! The update logs do not show something problematic to me.

Is that a failed upgrade? How would I know and re-trigger the upgrade, as the UI does not provide an upgrade button now that it thinks it is up-to-date

Edit Maybe related to this here: Unable to upgrade even though running on older version hmmh but I have 4GB of memory and 1 physical CPU core. @envieme as the topic is already closed: do you see the correct installed version now? And what are your RAM settings for the DB and how much RAM do you have available?

There seems to be some strange SELECT statements taking several dozen seconds and 5-10% CPU:

10566 message+  20   0 1183312  20620  17036 D   4.7  0.5   0:01.27 postgres: discourse discourse [local] SELECT                                              

Hey, I have the same issues and wondering the same if the upgrade failed. But I did the upgrade go through after the first failure due to RAM. Am on a 2GB digital ocean droplet with 2 GB swap space. Before I created the swap, the upgrade failed. And there was no way to upgrade from the dashboard page as it was showing as You are upto date though versions weren’t matching. Then someone asked me to go to /upgrade and I initiated it from there. And it saw the full upgrade finishing in the console.

So what i think now is the upgrade has finished well. But the dashboard is having a bug showing still the app is on the previous version though it shows ‘up to date’ as well.

Thanks! Do you experience other issues (except the version mismatch) like for me, where the topics do not load anymore?

No I do not have any other problems except for the version mismatch. If I find anything amiss will revert back!

You could ssh in and trigger a rebuild at the command line to see if that helps!

I already did this multiple times :confused: (When I said ‘rebuild’ I in fact meant this)

Currently trying to restore from a backup, but it looks like I have not understand the dockerified discourse. I re-fetched it from and boostrapped it but discourse shows the same version after this already ‘filled’ with data, but as I now understand it that postgres is shared.

How would I install a new discourse from scratch on the same server using a backup?

Edit: Removed docker, rm -rf /var/discourse and starting from scratch via the docker image still shows the different version (?) Should I try the backup or is that wasted time?

Maybe reboot the server?

Yes, but didn’t help. Restored the image in the fresh install but still had the problem where topics do not load :frowning:

And nothing on browser console and Discourse logs?

Sorry, I did look in it and saw nothing but have removed everything again.

But now I have another problem: How do I properly remove discourse and start over?

I was just doing sudo rm -rf /var/discourse and checked out discourse_docker.git repo again. But I get an error while bootstrapping the container:

2016-04-22 22:52:41 UTC [414-1] discourse@discourse ERROR:  relation "users" does not exist at character 323
2016-04-22 22:52:41 UTC [414-2] discourse@discourse STATEMENT:                SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                             pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                        FROM pg_attribute a LEFT JOIN pg_attrdef d
                          ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                       WHERE a.attrelid = '"users"'::regclass
                         AND a.attnum > 0 AND NOT a.attisdropped
                       ORDER BY a.attnum

According to this comment the error seems not problematic for fresh installs. Still confusing.

(I have the full logs for “./launcher bootstrap app” if someone is interested)

Now trying to go back to the old version:

./launcher rebuild app > rebuild.log 2>&1

using v1.6.0.beta1 and then this version is correctly showing up but the backup from before v1.6.0.beta2 (i.e. v1.6.0.beta1) somehow does not succeed. It says the current version 20160420172330 vs. restored version 20160329101122 (release date of v1.6.0.beta1)

The website e.g. URL/t/669/3.json?track_visit=true&forceLoad=true&_=1461376445862 does not load instead throws a bad request after ~30sec

and the logs says:

LOG:  duration: 70737.709 ms  execute <unnamed>: SELECT "incoming_emails".* FROM "incoming_emails" WHERE "incoming_emails"."post_id" IN (2190, 2192, 2193)
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost

When I try:
SELECT count(*) FROM incoming_emails;

I get count=2941106 but this takes over 60 seconds. I’ve removed some problematic ones now the count is down to 23 but it still takes the same time.

Doing e.g. SELECT count(*) FROM email_logs; is much faster although more rows.

1 Like

I have no idea why the incoming_emails table is still slow although I have cleaned it up (contained a few million mails due to a problem I had a few weeks ago)

Anyway … this fixed my slowness problem and site is loading as expected:

create index incoming_emails_index on incoming_emails(post_id);

Officially added the index :lollipop:


Please keep in mind that I had lots of emails in there due to a config bug - see here, not sure if this is something you want to have for all.

Now what I do not understand is why it took seconds although, after the manual cleanup, there where only a few (~33) rows.

In this process I learned a lot so I think I need to write this down somewhere, and why not here where you people can suggest improvements :slight_smile: ?

# list entries of email_logs table
cd /install/discourse
./launcher enter app
su postgres
psql -d discourse
SELECT count(email_type) as count,email_type FROM email_logs GROUP BY email_type;

# find out column names
select column_name from information_schema.columns where table_name='incoming_emails';
SELECT count(from_address) as count,from_address FROM incoming_emails GROUP BY from_address;
DELETE FROM incoming_emails * WHERE from_address='';

# build custom discourse: edit version in app.yml and set e.g. v1.6.0.beta1
./launcher rebuild app

# copy backup from somewhere else into the server of the discourse installation
# if you start from a fresh installation: create one backup so that this folder is created with the correct rights
scp some@whereelse:/some.tar.gz /var/discourse/shared/standalone/backups/default/

# wipe all data to restart from scratch and restore from backup:
# CAUTION: this will not only destroy discourse but also the data in the DB
# but for me this was important to make sure that a wrong migration or faulty data wasn't the cause
./launcher destroy app
rm -rf /var/discourse/

After finding the issue, the problem would not have appeared if I had reduced this setting to lower count. See settings -> email_reject_auto_generated

Is there a tail -f version of ./launcher logs app?

1 Like

Is there a tail -f version of ./launcher logs app ?

You can get the running container ID with docker ps and then use docker logs -f so putting them together (for a server running one container as you typically have with Discourse):

docker logs -f $(docker ps -q)

You might want to run this in screen.