I am using Backblaze as S3 storage and have the clean up orphan uploads setting turned on. The problem is, instead of deleting the orphan upload, there created a duplicated file with 0 byte.
see below for example. the (2) indicates the number of files with the same name. if expanded, you will see the original file still exists, also with an 0 bytes file. Has anyone had the similar issue? is it the problem of Backblaze or the setting? Thanks.
Discourse uses the S3 API, and since it works fine with AWS S3 it leaves this to being a Backblaze problem. Maybe contact their support? I will add a note about this in Using Object Storage for Uploads (S3 & Clones)
Thereās an ellipsis which suggests the name of the 0-byte file is being truncated. What is the full name there? Iād wager the bucket has a file lifecycle configured and that file is a āhide markerā, as they call it.
When the lifecycle expires both versions should disappear.
The truncated part is āhiddenā. I think you are right. It looks like the duplicated files are all recently uploaded ones. I will wait some time and see if they are gone. Thank you.
Hi,
Can you please update us if the issue has been resolved?
More than a year passed since your last post, so hoping this is OK nowā¦
The issue is not resolved yet. I just now manually deleted all the orphan files from BackBlaze manually. I think Iāll move to S3 at this point because regularly cleaning up the storage is a task in itself.
Just an update with my experienceā¦
I have a free B2 account (i.e. the first 10gb) and I donāt have a payment method added.
The backup files are rotated and the āhiddenā files are removed permanently after a few days. I never have more than 7 files at once (Discourse has max 5 daily backups rotation on). Since my backups are never more than ~500mb each Iāve never had to pay for anything or delete any āorphanā files manually.
Has there been any change with this issue? Has anybody tried contacting Backblaze support about this?
But the issue seems to be with āUploadsā!!
As the āBackupsā arenāt so many/too many files to manage. They are manageable, even manually.
It seems this issue continues.
@Falco could you elaborate on what S3 operations Discourse performs to clean up orphans?
Thanks. I see copy_to_tombstone
is involved here. I couldnāt find any docs about tombstone and orphaned files, so I can only assume how it works based on what I read in the forum. Please correct me if Iām wrong:
- When orphaned file is identified (clean orphan uploads grace period hours), it is copied to tombstone folder (
copy_object
). - It is then deleted (or supposed to be deleted) with
delete_object
. - When the time comes (purge deleted uploads grace period days) it is deleted from tombstone folder.
Is this correct?
Based on what I see in the Backblaze, it does appear to be copied to tombstone folder. Itās just not deleted, but instead an empty hidden version is created.
So I did contact support, and it seems there is a solution to the orphan problem. Hereās what Backblaze support said:
A few things are going on here. Looking at your account here, you have your bucket lifecycle rules set to Keep all files . If you were to change your lifecycle rules to Keep only the last version of the file , the hidden file will be deleted after 24 hours of behind hidden, and clearing up storage space.
Now, another layer to add in here is when using an S3-compatible service, any time an object delete is called, it will be hidden. And from here, depending on the lifecycle rules of the bucket, it will be deleted or remain hidden. In your case, the files remain hidden and are not being deleted due to Keep ALL files.
For an object to be deleted when an object delete is called, is to have the file version ID with the accompanying delete call, which I donāt think the integration is doing if files are just being hidden.
In order for these files to be deleted from the bucket, youāll need to sign into your Backblaze account, go to your bucket, and update its lifecycle setting to Keep only the last version of the file. This will delete the hidden file from the bucket after 24 hours.
By by default, Backblaze sets āKeep all filesā lifecycle rule for newly created buckets. Thatās probably why everyone has issues with orphans. Changing lifecycle rule to āKeep only the last version of the fileā deletes orphans after 24 hours once itās hidden.
This might be worth mentioning in this thread:
Nice find! Can you please edit it to the wiki?
Didnāt realize it was a wiki. Updated.