Azure Blob Storage Plugin

discourse-azure-blob-storage :floppy_disk:

I have created the Azure Blob Storage plugin as part of my Outreachy internship with Discourse. It allows storing uploads into the Azure Blob storage.

Azure account

  1. If you don’t have an Azure account yet, create one for free
  2. In your Microsoft Azure portal, go to Storage account and add a new account - make sure to select Blob storage under Account kind
  3. Go over to your newly created storage account and find Containers in the settings. Add a container into which uploads should be stored and set the public access level to Blob.
  4. Find Access keys in the storage account settings to fill in later when configuring Discourse.

Discourse configuration

  1. Install the plugin with the help of instructions here: Install Plugins in Discourse
  2. Go to Admin Settings → Plugins, fill in the storage account name, access key and container name (cdn is optional) and enable the plugin
  3. That’s it! Uploads should be stored into Azure Blob from now on.

Note: The plugin has yet to be thoroughly tested, I’d be happy if you give it a go and report any potential problems.

Bulk moving existing uploads

After activation of this plugin, all new uploads will start going to Azure. However, existing uploads will stay in the default /uploads/default/ directory on the local hard disk. To move them to Azure Blob Storage, follow these steps:

1. Download

Use your favorite file transfer tool to log into the VPS and download the entire directory tree of /var/discourse/shared/standalone/uploads/default.

Note: Notice that we do not upload the files to Azure Blob Storage yet. If you do, and you don’t change the post URL’s quickly enough, then the background scrubber may run and many of your uploaded files may end up in the tombstoned folder instead! If that happens, all you can do is to move those files back to the container.

2. Backup database

Go to Admin->Backups and take a snapshot backup of the database.

3. Change upload url’s in the database

SSH into the VPS and do the following:

cd /var/discourse
sudo ./launcher enter app
su -c 'psql discourse' postgres

psql#
update uploads
set url = replace(url,'/uploads/default/', '//xxxxxxxx.blob.core.windows.net/yyyyyyyyy/')
where url like '/uploads/default/%';

psql#
update optimized_images
set url = replace(url,'/uploads/default/', '//xxxxxxxx.blob.core.windows.net/yyyyyyyyy/')
where url like '/uploads/default/%';

psql# \q

where xxxxxx is the name of the Azure Storage Account and yyyyyy is the name of the container.

4. Remap URL’s in posts

rake posts:remap["/uploads/default/","//xxxxxxxx.blob.core.windows.net/yyyyyyy/"]

where xxxxxx is the name of the Azure Storage Account and yyyyyy is the name of the container.

Beware, Azure Blob Storage uses HTTPS for access mostly. If the original URL is HTTP (i.e. http://) then you have to manually fix these to use https://.

5. Rebake all the posts

rake posts:rebake

6. Upload

Use your favorite tool to log into the Azure Blob Storage (e.g. Microsoft Azure Storage Explorer which is free, or CloudBerry Explorer for Azure Blob Storage, also free). Upload the entire tree into the target container (there should be two folders, original and optimized).

7. Bob’s your uncle!

26 Likes

@SimonWu / @schungx did you have a chance to try this out?

Yep. it works. See Plugin to store uploads into Azure blob storage

9 Likes

:blush: I haven’t tested it yet, but will soon!

Promise!

2 Likes

I wonder… What happens to the existing uploads already on the hard disk? Will they migrate to azure blob as well?

Or is there a way to bulk migrate?

Nothing really, they stay where they are.

It’s not possible yet.

Seems to be working fine so far…

Since Azure supports directory tree, can I just copy over everything that’s in standalone/uploads and then upload it into Azure in one go?

I suppose then I’ll have to run a task to change all the old URL’s in the posts to point to the new URL. Anybody knows how to do that?

Can I just do a global search on the old URL and replace them with new URL’s, then globally recook all the posts?

I suspect that the URL’s are extracted somewhere and stored, right?

Yes, you could copy the /uploads directory into your container, and prepend //my-storage-account.blob.core.windows.net/my-container to the urls in the Uploads table. I tried that and after rake posts:rebake it seems to work fine.
I should really make a task to make this easier.

Meanwhile I noticed some weirdness when it comes to deleting files so I have to fix this first, but on the surface it shouldn’t really affect how it works.

3 Likes

Great! Let me try that. Then maybe we really should make a wiki on this.

2 Likes

I made the OP wiki so you can help improving the instructions if you wish.

2 Likes

There seems to be a few problems uploading non-image items in posts, especially videos.

Repo:

  1. Upload an MP4 video file.
    2a) If the MP4 video’s filename contains non-ASCII characters, it will fail.
    2b) If the MP4 video’s filename contains only ASCII characters, it will upload successfully, but the resultant URL is wrong.

For 2b, instead of the correct:

https://azurestorage.blob.core.windows.net/support/original/2X/3/XXXXXXX.mp4

It has:

https://original.forum.com//azurestorage.blob.core.windows.net/support/original/2X/3/XXXXXXX.mp4

In the preview window the video will be shown as “not found”. Manually editing the URL to delete the erroneous makes it work again.

Here original.forum.com is the domain of the forum itself, and azurestorage.blob.core.windows.net is the domain of the Azure Blob storage.

3 Likes

Thanks for reporting the bug, I’ll take a look.

4 Likes

@Maja, I traced it down to this:

https://github.com/discourse/discourse/blob/9d97e1244ebebe3d83ae2467cd0411300fab87da/app/assets/javascripts/discourse/lib/utilities.js.es6#L365

As you can see, all video files are treated specially, calling uploadLocation instead.

https://github.com/discourse/discourse/blob/9d97e1244ebebe3d83ae2467cd0411300fab87da/app/assets/javascripts/discourse/lib/utilities.js.es6#L347-L359

As you can see, if Discourse.SiteSettings.enable_s3_uploads is true, which it should be, the URL will be the correct one (i.e. with https: prepended).

However, it seems like it is falling into the else clause, prepending the host domain URL.

5 Likes

Yes, sorry for leaving no updates here. I have already opened a PR few days ago (https://github.com/discourse/discourse/pull/5829) and it has yet to be merged. Thank you for looking into it anyway.

7 Likes

Manually inserting a record into uploads and recooking the post does NOT regenerate the missing post_uploads link record. Reason:

https://github.com/discourse/discourse/blob/290ee312e673c81f0b7ff3c1433a0ecb9801044b/app/models/upload.rb#L78-L94

It seems that this method is not finding the correct record in uploads based on the video source. The recooking will run a method that searches all a/@href and img/@src, meaning all links and all images.

However, all videos are turned into video tags with source tags, so it won’t find them.

There is at least one a tag that should contain the necessary href for this to be picked up. However, that href will contain the protocol of the url (i.e. it will contain the https: and I think this prevents it from being matched to any url in uploads (which do NOT have protocol prefixes).

1 Like

I believe this could be the reason why all my video files are tomestoned.

Thanks for reporting, I’m going to check this now. I have to manually create a upload record and add an upload URL to the post (and recook afterwards) to reproduce this? It does sound like this causes videos getting tombstoned…

https://meta.discourse.org/t/rebuild-upload-links/88233

I just read this topic and am concerned about files being moved to tombstone… Does it happen for video files only - all of them or only some? How were the tombstoned files uploaded - via the composer?

If there are no post_uploads for uploads or even no records for uploads themselves, they’ll keep getting removed. So far I didn’t have any luck reproducing the issue.

4 Likes

Just video files. All other types of files are not affected.

All the files are uploaded via the composer. Not sure id this happens when the video files are created via email. I can check.

Uploading a video file via the composer will create record in uploads, but no corresponding record in post_uploads.