All of the posts with missing images are at least two or three days old. Do you have any form of object expiration configured on your bucket?
If you followed the S3 guide it isn’t being used as a cache, it’s the place discourse is offloading originals to. Any expiration settings means the file gets offloaded and then totally separate from discourse it’s being deleted by a process at amazon.
Why are you using a cdn and cloudflare in addition to S3? Is maxcdn providing the S3 bucket?
I’ve figured out this is related to tombstone, I found the images in a tombstone folder on s3. There was a lifecycle rule in my bucket called purge_tombstone that was moving the images out of the main folder. I have disabled the rule but I’m wondering if this is a bug causing this? @zogstrip@codinghorror any thoughts?
I was able to move all the images back from the tombstone folder and everything is running smoothly now. The question now becomes why were the images tombstoned in the first place and how do I avoid this in the future?
I found this similar post, but I still can’t put two and two together for what caused this.
i met this problem twice before. The latest one happened two days ago. This kind of problem does not happen often on my website.
But i uploaded my images on the hosting server, rather than s3.
I found out that my “bad” writing habits might cause problems.
my “bad” writting habit:
I will upload images first than cut the markdown links into the local editor. But i don’t close the discourse editor. Afterwards, i will paste them back.
Why do i say that?
“I found out that my “bad” writing habits might cause problems.”
Because the time i set for the following setting just equal the time them disappeared. clean orphan uploads grace period hours
But I am not sure if it is caused by this reason.
When i totally use the discourse editor ,everything is fine.
What i have done for this:
I deleted the topics then I re-uploaded the content.
I rebaked posts.
I rebuilded the discourse docker instance.
cd /var/discourse
./launcher enter app
rake posts:rebake
I don’t think it is, I went to /sidekiq and it seems to be running but I am ignorant on this. There are no dead jobs, 73 fails but I don’t know the timeline.
It seems to me that something I did this morning caused tombstone to run, is there any way I can identify that?
That’s what we’re all trying to get to- something is happening because something changed.
But depending on the expiration policy it may not be a change today, it could be n days ago where n is whatever expiration period was set, so it could be 3 hours, 3 days or 3 weeks.
Even personal sites need some kind of schedule of change, I’d recommend a thread in /staff at the minimum. It’s the quickest way to pin down the source of such problems.
@tgxworld and I isolated a condition where uploads were missing associations with posts.
For the last 5 or so years we had a sidekiq deferred job create the association saying that upload X belongs to post Y. This has worked OK but has some pretty bad edge conditions.
If for some reason the job was mega delayed or lost then the associations are not created. If the associations are missing there is a regular job that moves the uploads to tombstone, there is a second policy that eventually deletes from tombstone.
Our new process is to create the association just as you save the post. This means this edge case can not happen anymore. I recommended a rebake to make sure if the job missed on a few posts that we would still create the associations.
We fixed this issue late last week, so there is a short transition period in place.
FYI I manually moved all the files in tombstone to their proper locations rather than doing an rsync, which may have been the fatal flaw and broke the images, what are my options now?