Backblaze S3 issue: duplicated uploads after delete

I am using Backblaze as S3 storage and have the clean up orphan uploads setting turned on. The problem is, instead of deleting the orphan upload, there created a duplicated file with 0 byte.
see below for example. the (2) indicates the number of files with the same name. if expanded, you will see the original file still exists, also with an 0 bytes file. Has anyone had the similar issue? is it the problem of Backblaze or the setting? Thanks.

Screen Shot 2021-03-15 at 8.00.38 AM

Discourse uses the S3 API, and since it works fine with AWS S3 it leaves this to being a Backblaze problem. Maybe contact their support? I will add a note about this in Using Object Storage for Uploads (S3 & Clones)

3 Likes

Thereā€™s an ellipsis which suggests the name of the 0-byte file is being truncated. What is the full name there? Iā€™d wager the bucket has a file lifecycle configured and that file is a ā€œhide markerā€, as they call it.

When the lifecycle expires both versions should disappear.

3 Likes

The truncated part is ā€œhiddenā€. I think you are right. It looks like the duplicated files are all recently uploaded ones. I will wait some time and see if they are gone. Thank you.

1 Like

Hi,
Can you please update us if the issue has been resolved?
More than a year passed since your last post, so hoping this is OK nowā€¦ :slight_smile:

1 Like

The issue is not resolved yet. I just now manually deleted all the orphan files from BackBlaze manually. I think Iā€™ll move to S3 at this point because regularly cleaning up the storage is a task in itself.

3 Likes

Just an update with my experienceā€¦
I have a free B2 account (i.e. the first 10gb) and I donā€™t have a payment method added.

The backup files are rotated and the ā€˜hiddenā€™ files are removed permanently after a few days. I never have more than 7 files at once (Discourse has max 5 daily backups rotation on). Since my backups are never more than ~500mb each Iā€™ve never had to pay for anything or delete any ā€˜orphanā€™ files manually.

2 Likes

Has there been any change with this issue? Has anybody tried contacting Backblaze support about this?

1 Like

But the issue seems to be with ā€˜Uploadsā€™!!
As the ā€˜Backupsā€™ arenā€™t so many/too many files to manage. They are manageable, even manually.

It seems this issue continues.

@Falco could you elaborate on what S3 operations Discourse performs to clean up orphans?

Thanks. I see copy_to_tombstone is involved here. I couldnā€™t find any docs about tombstone and orphaned files, so I can only assume how it works based on what I read in the forum. Please correct me if Iā€™m wrong:

  1. When orphaned file is identified (clean orphan uploads grace period hours), it is copied to tombstone folder (copy_object).
  2. It is then deleted (or supposed to be deleted) with delete_object.
  3. When the time comes (purge deleted uploads grace period days) it is deleted from tombstone folder.

Is this correct?

Based on what I see in the Backblaze, it does appear to be copied to tombstone folder. Itā€™s just not deleted, but instead an empty hidden version is created.

1 Like

So I did contact support, and it seems there is a solution to the orphan problem. Hereā€™s what Backblaze support said:

A few things are going on here. Looking at your account here, you have your bucket lifecycle rules set to Keep all files . If you were to change your lifecycle rules to Keep only the last version of the file , the hidden file will be deleted after 24 hours of behind hidden, and clearing up storage space.

Now, another layer to add in here is when using an S3-compatible service, any time an object delete is called, it will be hidden. And from here, depending on the lifecycle rules of the bucket, it will be deleted or remain hidden. In your case, the files remain hidden and are not being deleted due to Keep ALL files.

For an object to be deleted when an object delete is called, is to have the file version ID with the accompanying delete call, which I donā€™t think the integration is doing if files are just being hidden.

In order for these files to be deleted from the bucket, youā€™ll need to sign into your Backblaze account, go to your bucket, and update its lifecycle setting to Keep only the last version of the file. This will delete the hidden file from the bucket after 24 hours.

By by default, Backblaze sets ā€œKeep all filesā€ lifecycle rule for newly created buckets. Thatā€™s probably why everyone has issues with orphans. Changing lifecycle rule to ā€œKeep only the last version of the fileā€ deletes orphans after 24 hours once itā€™s hidden.

This might be worth mentioning in this thread:

1 Like

Nice find! Can you please edit it to the wiki?

Didnā€™t realize it was a wiki. Updated.

1 Like