Cannot download non-image media files, original filenames lost when uploaded to S3

Non-image media files residing on local storage or those migrated to S3 using rake uploads:migrate_to_s3 are downloaded with original filenames when clicking the upload link, while non-image media files uploaded directly to S3 are opened in new tab and the original filename is lost.

rake uploads:migrate_to_s3 sets content_disposition for all non-image uploads

https://github.com/discourse/discourse/blob/666823d4b7cf7bee4c1c81c1df1d1db944620f7f/lib/file_store/to_s3_migration.rb#L223

while upload to s3 sets content_disposition for all non-media uploads only

https://github.com/discourse/discourse/blob/d9a02d1336dd86e7fff4f14735fb0f1970555abd/lib/file_store/s3_store.rb#L57

The fix is to replace is_supported_media with is_supported_image (tested and working as expected).

3 Likes

Do we need this fix @sam?

3 Likes

possibly @martin what do you think is that change correct?

4 Likes

I think this fix sounds right – let me test out your proposed changed locally and if it looks okay I will make a PR.

3 Likes

Looking at this again, I think the solution is the other way around – the uploads:migrate_to_s3 task should be if !FileHelper.is_supported_media?(name). It doesn’t make sense to put the content-disposition: attachment; filename=X header on videos and audio. You are streaming those files from within a Discourse post, not downloading them?

So what we would want is:

No content-disposition attachment header

  • Image
  • Video
  • Audio

Yes content-disposition attachment header with original filename

  • All other attachments/uploads (PDF, TXT, CSV, etc.)

If I’m not seeing something here please feel free to add more information / examples.

3 Likes

Believe me, I’ve spent days figuring this out, and it seemed strange to me too, but setting content-disposition: attachment; filename=X to all media files except images perfectly mimics how media files are currently served from local storage.

In brief, NO! I can stream them using oneboxing if needed, but direct download link [media_file](path_to_media_file)—where path is either local or S3—should download the file using its original filename the same way it does on local storage (and for files migrated to S3).

But this is suddenly not possible for direct uploads to S3: [media_file](S3_path_to_media_file) link streams the media file in new tab (not needed, as this is what oneboxing does), and when trying to download it from the media player controls it also loses its original filename.

I would expect that files hosted locally and on S3 should behave the same, agreed? With your proposal the functionality would be completely reversed for S3 hosted uploads, not only for new ones but for migrated ones as well.

Here is a detailed case file, and why I believe my proposal is correct (i.e. it retains the same functionality for uploads hosted both locally and on S3):

Files hosted locally

I have a large (5k+) repository of audio (speech) files ranging from 1–10MB, with sum total of ~40GB, which are currently hosted on local storage and are now being transferred to S3 (this is by design, I don’t want them hosted on 3rd party service). This already works quite well from the performance perspective, but it now makes sense to migrate to S3 because of the rising storage costs and the option to use CDN with S3.

These files are optimised for size and are not meant to be streamed, but rather downloaded and listened to locally (for clients with limited bandwidth/connectivity). Files are uploaded in bulk and then referenced in raw post using links generated from their respective SHA1 values using: [dl_link](https://discourse.forum.tld/uploads/default/original/3X/0/1/0123..sha1.mp3) and can be downloaded using their original file name by clicking the direct link to the file (see example here).

If, on the other hand, this links were oneboxed they could still be streamed directly from the Discourse server (and also downloaded from the player controls, again with the original filename).

Files which were migrated to S3

When uploads are migrated from local storage to S3 using the uploads:migrate_to_s3 two things happen:

  • content-disposition: attachment; filename=X is set for all uploads except for images
  • direct local links [dl_link](https://discourse.forum.tld/uploads/default/original/3X/0/1/0123..sha1.mp3) in raw posts are replaced with S3 links [dl_link](https://cdn_url/uploads/original/3X/0/1/0123..sha1.mp3)

This mimics the locally hosted uploads in every way (download links with original filenames as well as oneboxing).

Files newly uploaded to S3

  • content-disposition: attachment; filename=X is not set for all media files
  • newly uploaded files are referenced using short-url in raw, but cooked links still point directly to https://cdn_url/uploads/original/3X/0/1/0123..sha1.mp3
  • clicking the short-url link or the manually inserted S3 link [dl_link](https://cdn_url/uploads/original/3X/0/1/0123..sha1.mp3) opens the file in new tab instead of downloading it.

Because content-disposition is not set this now differs from how media files are served from local storage (oneboxing still works, but the download links with original filenames are not available any more).

If you need any additional info please let me know.

2 Likes

That’s great, thank you for providing all that additional context. I will play around with this and aim to get a PR up next week.

5 Likes

Any chance this will make it into the final 2.5 version which is due soon?

1 Like

To be honest, I have not looked into this further because of other pressing work. I have put aside some time in my schedule now to take a proper look at this this week. The change seems minor, so it should not take long. Will report back once the PR is done.

4 Likes

@md-misko I confirmed your use case and after fixing the content-disposition I am still able to stream audio and video correctly. Fix is building now and should be merged by the end of today:

https://github.com/discourse/discourse/pull/10058

Thanks for bearing with me here!

4 Likes

This topic was automatically closed after 3 days. New replies are no longer allowed.