Avatars lost after restore. How to get them back?

I have changed the name because i cannot shutdown the original server for hours, until i have tested that all is working ok and i swap the servers.

You don’t need to. If you have a wildcard cert you can just make local DNS changes via Hosts to configure everything and restore the backup itself.

Then you just flip DNS over publicly.

I don’t understand what you mean.

I have to keep a.domain.com up running while i make the tests.

And i need to access the discourse interface of the copy that i am restoring to see if everything is ok.
So i need another url to access the copy, in the other server.
So i simply change the host name in discourse and nginx afyer restoring.

When everything is ok i change the name in the new server to a.domain.com, i shutdown the old server and pint dns a.doman.com to the new server.

The above isn’t correct. You can force your local machine to connect to the new server using the same DNS name, either by altering the HOSTS file on your local computer, or hard-coding an entry to your local DNS.

2 Likes

I have no local machine.
Both servers are in internet, ore cloud servers.

I use ssh from a windows machibe.

Nay be I can twake the local hosname to set the ip of the machine, but it isore complexvthan changing the name in the severs.

Do you think that a change in the hos name can be the problem?

It should no be a problem…

@ariznaf,

Yes, we starting seeing this custom avatar problem again long after the sidekiq process had time to rebuild any adjunct avatar and profile images, but only on the configuration with the nginx reverse proxy to a unix domain socket.

The small icon avatars are fine; but they don’t work in profile card or in the profile pages (unless they were cached before and the cache has not expired).

As soon as we do this:

nginx -s stop; ./launcher start web-only

The problem with the custom avatar images goes away (on images not previously viewed / cached in the browser).

Screen Shot 2020-04-18 at 9.29.16 AM

and as soon after we do this:

./launcher stop web-only; nginx

The problem with the custom avatar images returns for images not yet views / cached.

There are no errors with HTTPS and this is definitely not because of force_https (totally unrelated):

discourse=# select * from site_settings where name like '%http%';
 id |    name     | data_type | value |         created_at         |         updated_at         
----+-------------+-----------+-------+----------------------------+----------------------------
 79 | force_https |         5 | t     | 2020-04-16 05:51:13.165124 | 2020-04-16 05:51:13.165124
(1 row)

We have confirmed this issue on mobile (ios, latest version), on desktop, in chrome (latest), in safari (latest), etc.

There is something odd which happens using the ngnix as a reverse proxy to a unix socket which effects the custom avatar images.

So far, sorry to inform @ariznaf, we cannot isolate the problem and don’t have a solution.

It “feels like” that in the nginx reverse proxy to a unix socket config, the discourse app (the container) is not rebuilding these custom avatar images like they do in the config without nginx as a reverse proxy to a unix domain socket.

Maybe sidekiq does not like the nginx reverse proxy to unix socket config and refuses to schedule or run this rebuilding process, LOL ? @riking ?

Strange.

@riking

Here is a followup:

In the reverse proxy config using unix sockets we are discussing:

However, when we check force_https:

Last login: Fri Apr 17 06:29:58 2020 from 159.192.216.252
# cd /var/discourse
# ./launcher enter data
# su postgres -c 'psql discourse'
psql (10.12 (Debian 10.12-1.pgdg100+1))
Type "help" for help.

discourse=# select * from site_settings where name like '%http%';
 id |    name     | data_type | value |         created_at         |         updated_at         
----+-------------+-----------+-------+----------------------------+----------------------------
 80 | force_https |         5 | t     | 2020-04-18 03:33:10.641567 | 2020-04-18 03:33:10.641567
(1 row)

And of course, as espected, there is no error in the browser cert (chrome is happy):

Screen Shot 2020-04-18 at 10.49.22 AM

So, my uneducated guess is that this configuration the force_http setting / method is missing these images; because everything else is flawless except these customs avatars (and the profile page image).

That is my above my pay grade with discourse guess.



Update:

After more research, it turns out that all of our nginx reverse proxy to unix socket" configurations were missing the much of all the /shared/uploads (files). This step was missing from the various tutorials and howto docs read on this, so the next time I see a wiki on this topic at meta, I’ll update the tutorial / wiki / howto include this step.

The only remaining small gotcha, is an issue with the favicon:

If anyone has recommended way to fix this, would be great. Thanks!

Reupload it. That’s the quickest solution.

People forget when they’re using a socket that they’ve disabled the HTTPS template, so Discourse doesn’t know it’s behind SSL unless force_https is enabled manually, hence my suggestion yesterday.

Once force_https is enabled you can re-upload the assets to correct their paths.

1 Like

I’ve been not replying to any of this because I assumed some kind of botched server data transfer (not using the built-in backup feature) had left out the /uploads/ entirely and I had no clue how to explain that in a way you would understand.

1 Like

Yeah… we followed this guide to set up the nginx reverse proxy, but this was for a standalone setup and did not mention the uploads because it was not needed to be transferred in standalone mode:

And we followed this guide for two containers, which also did not mention doing any DB restore or transferring any upload directory:

I think we can easily understand things. Here is the clue you were missing to help explain, for reference:

The main tutorials on this configuration leave out the fact that you should either do a DB restore, or transfer your uploads manually to the new container because we did not include that.

Or course, it makes sense now after figuring this out 100% on our own (again!) because it is not in the tutorials. LOL

Everything is easy after you know what the problem is.

:slight_smile: :slight_smile: :slight_smile:

PS: In closing. Thanks to everyone who wrote various tutorials. They were a big help! Much appreciated. On our end, this configuration is done and we will no longer use any standalone configuration on any discourse sites in the future. Our normal "default’ will be two containers with a reverse proxy to a unix socket. This works the best for making updates and switching containers in real time with almost zero down time. Good stuff!!

Discourse is GREAT!!

Well done Jeff @codinghorror and Sam @sam ! BRAVO!

:heart: :heart:

@ariznaf

This is fairly easy to get working, but as I mentioned earlier, we don’t use S3 and other cloud storages services; and prefer to keep things “simple” and so our backups are just rsyned to offsite storage. We prefer it this way… it’s one less thing to debug :slight_smile: and we can “live” without S3,

1 Like

I don’t know if this is helpful or not. But image optimization tends to fail, if the job running the optimization job is unable to reach your server via its internet domain name.

That could explain why this is not working as expected in a more complicated reverse proxy setup.

3 Likes

Thank you Kane.

I was trying (as you probably know) other alternatives to the standard backup method through UI in discourse.
If I was trying that is due to having problems each time I tried a restore from the standard backup method, and always following instructions given in official tutorials in this site.

Anyway I have stated at the beginning that I was doing this restoring from a backup made using the UI interface and the standard backup procedure and recommended restore procedure.

The only difference with a standard install of a standalone discourse is that we are using nginx as a reverse proxy, connected to discourse through a socket.

The main problem we found was with avatars, which were apparently lost and substituted by that whit profile.

You told me that it was that they had to be rebuild by the sidekiq job. But that job seems to finish inmidiatly when you launch it with success (OK status) but avatars remain the same.

As backups are very important for us (who can ignore them?) I will do more test, trying to be very carefull in following the directions and trying other ideas given here.

We are happy with discourse, we like it a lot. Just we are trying to be sure we have a robust restoring procedure, just in case we suffer some kind of attact (we got one recently) or server failure.

If you want us to make some test to try to fix this problem or give some information I will be happy to do my best and provide that info.

1 Like

It seems the system cannot access the avatar miniatures, that is for sure.

But the rest of the forum is serve correctly, routes, ssl and everything is correctly configure as long as I can see.
If there were some kind of misconfiguration you could not hit the discourse forum and see the rest of the content, you would get 502 errors or something like that.

@neounix
We use S3 because it is the simplest method from the UI to have the backups offsite.

May be S3 is not the best option, I don’t know, otherwise where you have the backups saved is not related with the problem, as it is not a problem of not being ablo to reach them and make the restore.
The restore is correctly made.

@stephan
In the app.yml I have commented out the ssl template and letsencrypt template, and the expose section with the ports.
The ssl part is made by nginx, so I don’t need the socket to be encripted, right?

Is this incorrect? Should I use ssl template anyway?
I suppose that if this was the problem then I could not see any part of the forum after restore, not just the avatars, but who knows…

I will do more tests. Thank you all for your help.

@ariznaf … hey!

The way I solved this problem on two different servers was to manually copy the /shared/uploads directory from the original setup to the socket-only setups, and after that this problem went away.

The way I was quickly able to check was to compare the sizes of the uploads directory, simply like this (assuming you are in your shared directory):

du -sh  uploads

When I compared them, that is when I found out what the problem was :slight_smile:

Maybe you can check as well? Hopefully this might help you isolate your issue.

PS: I’m not negative on S3. To each his / her own as the saying goes…

let me see if I am correctly understanding you.

When I make the backups I have checked to save the uploads too (not the thumbnails, but I have tested it saving thumbnails too, now I am saving the thumbnails and so you will not have to wait for a rebake).

After the restore the uploads are restored too.

Do you mean that the restoring of the uploads is not correct and you have to do it manually?

How do you restore the uploads by hand?
Have you downloads the backup, and untar the shared/standalone/uploads?

If that is the case (I will try) it seem clear to me that there is some kind of bug in the restoring job.

Thank you.

I am looking for alternatives to do backups and store them, but people from discourse insists that the only correct way of doing it is using the standard backup.

Hi @ariznaf,

We don’t restore from the admin UI (we only backup from the web interface, not restore). We sftp the file into the container (with the uploads included and restore) and restore like this:

cat /shared/neo/bin/restoreneo
#!/bin/bash
echo "cd /var/www/discourse"
cd /var/www/discourse
echo "discourse enable_restore"
discourse enable_restore
echo "begin neo restore"
discourse restore unix-com-community7-2020-04-15-095302-v20200403100259.tar.gz
echo "discourse disable_restore"
discourse disable_restore

However, when I created a new nginx reverse proxy to unix socket configuration the other day, I did not restore from the DB because the database was already there in the data container (as the howto topics mentioned).

This is why I had to manually copy the uploads over to the new container.

Your situations sounds different than ours.

Hope this helps.

1 Like

Thank you.
It seem that you are making by command line the same procedure we make using the interface: enabling restores and restoring from the tgz files that contains database and uploads.

But then you say that in order to get avatars working (using sockets and nginx reverse proxy) you need a second restore of just the uploads, am I right?

Hey @ariznaf … not exactly…

In the beginning, we had a standalone app. I seperated that app into two different containers (data and web-only) and then I made a restore from the big backup file with the uploads.

All that went well…

Then, I created a new container, socket-only and configured it to use a reverse proxy.

I did NOT do a restore in the new socket-only container (because the data container already had the DB data in tact) but I failed to manually copy over the uploads (that was my error). If I had done a normal restore process that would not have been necessary.

But there is no reason to do a manually DB restore again in the new container because that is the reason to have the data in it’s on cute little container. So, in this situation, the uploads must be copied over to the new container. It’s actually nicely done.

Does that help?

Not what I said, I said the backend cannot reach itself through the front nginx. What you’re saying is the other way around.

In order to optimize an upload, the Sidekiq job retrieves it using http(s).

2 Likes

No you can disable the ssl template but will need to manually enable force_https.

2 Likes