Migration of system stored images to S3 after configuration change


#31

When I ran this task, it failed because I ran out of disk space on the local server. So there’s already some files in the S3 bucket. If I run the task again after freeing up some space, will those existing S3 files affect the rake task?


(Gunnar Helliesen) #32

Is this still the recommended way of doing this? Does it work? Anyone tried lately?

The reason I ask is because we have over 1.8 million messages, so rebaking everything is going to take a while…

Thanks!
Gunnar


#33

Still worked for me about a couple of months ago. For the rebake afterwards, I did have to size up my droplet in cpu./ memory to avoid some weird messages popping up during the rebake, which I assumed to be issues. but once I ran the rebake task with enough cpu/ memory, the process went just fine.

The s3 image migration was real slow though, like 100k/sec. Not sure if its me. also i recall i might have needed temp space that’s the same size as my migrating images for the process to complete.


(Gunnar Helliesen) #34

Thanks! As I understood it, from various discussions here on meta, there was talk of fixing the migrate script so a rebake would no longer be necessary. That’s not happened, you still had to rebake everything?

Thanks again,
Gunnar


#35

That would be best answered by the team. I also had other reasons that I wanted to run a rebake at that time.

But, my database is much smaller than yours, so its not a huge deal for me to run the rebake.


(Sam Saffron) #39

Yeah the migrate to s3 job is the way to go, but keep in mind s3 is not super fast, I would strongly recommend a CDN for it as well


#41

You meant cache-control header? Discourse by default doesn’t add such a header. And S3 doesn’t pass headers it doesn’t have to Cloudfront. Someone on i believe Stack Overflow suggests writing a Lambda function to automatically add header to S3 when files get uploaded, but I have yet to get it to work.

My forum is small, so the Cloudfront cache hit rate is only about 50% for the S3 pictures. So the CDN ends up going back to S3 quite a lot.

It may be a good idea if someone can figure it out how to put the optimized pictures on the local server and the original pictures on S3. I use DO and DO seems pretty speedy in comparison to S3. Putting a CDN over DO assets is cheaper too since one is not stuck with Cloudfront.


(Richie Rich) #42

Hi everyone,

Just to say “thanks” for this :slight_smile:

And having done this today, I thought I would share some stats from our small forum:

Migrating 527 images (total size 712MB) to S3 took only 9 minutes

Rebaking 5,514 posts took a surprisingly long 24 minutes

I manually placed my site in read-only mode before starting these tasks.

The contents of my /var/discourse/shared/standalone/uploads/default/optimized/ folder was not deleted, so I did this manually (after first checking a whole bunch of posts at random to make sure all the image links were now pointing at S3).

No avatar issues, etc.

A very painless task - thank you to everyone who worked on making this migrate_to_s3 task a reality :+1:


Is it safe to roll Docker from 18.01 back to 17.10?
Discourse crashed after backup - http 500
(Marco) #43

Hello everybody!
When I try to run the rake task rake uploads:migrate_to_s3 i constantly get asked to pass these four environment variables.

image

I already have image uploads and backup set up to send everything to S3 (actually, DigitalOcean Spaces) and it works perfectly. Clearly I need to move the uploads I have now on my server to S3/DO, and this is why I need to run this task.

I did set and exported them in the bash shell, but it still does not work.
I made sure that all values are correct, of course.

Any ideas what am I doing wrong?


(Régis Hanol) #44

Here’s how to pass variables to the rake task

DISCOURSE_S3_ACCESS_KEY_ID="....." \
DISCOURSE_S3_SECRET_ACCESS_KEY="....." \
DISCOURSE_S3_REGION="us-east-1" \
DISCOURSE_S3_BUCKET="bucket_name" \
rake uploads:migrate_to_s3

(Marco) #45

Unfortunately no joy.

I had a look into the source code of the rake task itself (I´m running on 2.2 beta 7) and going into rails console and running the code

image

BUT I do have the S3 enabled and working for both Backups and Uploads…


(Marco) #46

Sorry to insist again on this, but my problem is that at the moment I’m left with a half-done configuration:

  • upload of uploads in DigitalOcean Spaces works perfectly
  • upload of backups in DigitalOcean Spaces works perfectly
  • migration of old uploads stored on my virtual server to DigitalOcean Spaces not working. When I try to use the rake task rake uploads:migrate_to_s3 i constantly get asked about the four DISCOURSE_S3_* environment variables I listed a few posts above.

My virtual server is low on disk space and this was the basic reason to move the uploads files to DigitalOcean Spaces. At the moment I cannot include the uploads in my full backup as I run out of disk space in the process.
On top of that, I find myself having part of the attachments loaded on the virtual server and part of them on Spaces. Really not ideal…

Do you have any further suggestions about how to use the migrate_to_s3 task?


(Michael - DiscourseHosting.com) #47

Longshot:

export RAILS_ENV=production

Before running the above command?


(Marco) #48

Unfortunately no change…


(Justin DiRose) #49

You can add those settings to your app.yml file under the env section, rebuild the container, and then run the migration. It’s slower and may yield some downtime, but does the trick.


(Marco) #50

Thanks a lot for the tip!
A partial step forward.
Adding the environment variables to app.yml worked, but now I have a complain for another one

image

Problem is, being on DigitalOcean Spaces, I do not have a CDN set.
I’ m sorry to say but this re-writing of the rake task seems a bit shaky… :angel:


(Justin DiRose) #51

DO Spaces has a CDN you can turn on.

https://blog.digitalocean.com/spaces-now-includes-cdn/

From there you can set your CDN URL to that CDN in settings.


(Marco) #52

So, updating on the results of my effort.

  1. I enabled the CDN on my DO Space
  2. I copy/pasted the DO Space CDN endpoint into the website settings field
  3. I modified apps.yml, as suggested, entering the environment variable definition as follows:
  DISCOURSE_S3_REGION: "ams3"
  DISCOURSE_S3_ACCESS_KEY_ID: "<omissis>"
  DISCOURSE_S3_SECRET_ACCESS_KEY: "<omissis>"
  DISCOURSE_S3_BUCKET: "discourse-data"
  DISCOURSE_S3_CDN_URL: "https://discourse-data.ams3.cdn.digitaloceanspaces.com"
  1. I rebuilt the app, with success, seemingly…

Seemingly because the result was… a completely empty screen as website UI. :confused:
I had a look to the source code of the web page and I noticed that all the assets URLs, including js stuff, were now prepended with the URL of DISCOURSE_S3_CDN_URL rather than just starting with a local path, braking everything. Of course the only two things I have loaded in the DO Space are the backup files and those uploads that were posted to the forum after enabling “Uploads to s3” setting.

I had to roll back change n.3 (i.e. the environment variables) and rebuild app again, and I had my UI back :sweat_smile:

Jokes apart, all this seems to me really a bit too complicated, to the point that it smells like a bug.

There is another thing I can try to make the script work: define in app.yml all DISCOURSE_S3_* environment variables BUT the DISCOURSE_S3_CDN_URL. I had a look to the task source code and this variable is needed only in case the website setting is found empty. It can be that this way I can avoid to force the use of the CDN on all assets, and at the same time satisfy the script requirements…

I have to think about this, as I do not want to disrupt my forum…


Migrate_to_s3 task fails for placing too many requests
Setting up backup and image uploads to DigitalOcean Spaces
Migrate_to_s3 task fails for placing too many requests
Uploads migration to S3 fails