Download Remote Images with Referer (plugin possible?)

i’m working on a migration to discourse, and i know about SiteSetting.download_remote_images_to_local. it’s awesome.

however, when i (slow) rebake to download them all, i’m noticing that Photobucket serves a watermarked image depending on your referer.

i looked briefly, and it looks like downloading is handled in /lib/final_destination.rb. i could probably patch that temporarily just for the post-migration rebake, but that doesn’t handle the hypothetical situation of people hotlinking images from crappy image hosts going forward. i think any sane person knows better than to use Photobucket now, and idk of others that watermark, so maybe not a big deal.

my question is this… has anyone already solved this? and is this something a plugin could accomplish? i have not learned about how plugins work yet.

alternatively, would it be a bad feature suggestion to just always set the referrer to scheme://domain/ when downloading a remote image? when would that ever be a bad thing to have discourse itself do it?

if you want to see what i’m talking about:

https://i1111.photobucket.com/albums/h475/scoobystuff/Stereo/S1080033.jpg

# watermark:
curl -LO \
  'https://i1111.photobucket.com/albums/h475/scoobystuff/Stereo/S1080033.jpg'
# no watermark:
curl -LO --referer 'https://i1111.photobucket.com/' \
  'https://i1111.photobucket.com/albums/h475/scoobystuff/Stereo/S1080033.jpg'

I’m not a fan of the watermarking they do, but it would actually violate Photobucket’s terms of service to send a false referrer to hide the watermark as one of their requirements on free accounts is that the Watermark is included when using them for hosting. Paid accounts don’t get those watermarks added.

1 Like

what?

how do you reach that conclusion? no ToS have been agreed to by a party merely downloading an image without an account (or even authentication). or even presented to said party.

the ToS applies to the uploader. specifically:

THE FREE ACCOUNT DOES NOT ALLOW IMAGE HOSTING. TO THE EXTENT THAT IN OUR SOLE AND ABSOLUTE DISCRETION, WE ALLOW A FREE ACCOUNT TO HOST AN IMAGE, THE IMAGE WILL INCLUDE A PHOTOBUCKET WATERMARK THAT REFLECTS THAT THE IMAGE IS HOSTED BY US. IF WE PERMIT ANY FREE IMAGE HOSTING, WE RESERVE THE RIGHT TO BLOCK THE IMAGE OR BLUR AND WATRMARK THE IMAGE IN OUR SOLE AND ABSOLUTE DISCRETION. FREE ACCOUNT HOLDERS ARE STRONGLY ENCOURAGED TO UPGRADE TO A PAID ACCOUNT THAT PERMITS THIRD PARTY HOSTING.

further, this is a statement merely advising account capability (“THE FREE ACCOUNT DOES NOT ALLOW IMAGE HOSTING…IF WE PERMIT ANY FREE IMAGE HOSTING…”), and by agreeing, the account holder has merely been made aware of it. if i had a free photobucket account and had uploaded that image, i would not even be in violation by hotlinking it in this topic as the language does not imply that i must agree not to take such an action. the purpose of this statement is so that a free account holder cannot, for example, sue Photobucket for denial of service.

contrast that to the following statement that i made up and does not appear in their actual ToS:

THE FREE ACCOUNT HOLDER AGREES TO NOT USE THE ACCOUNT FOR IMAGE HOSTING AND WILL NOT HOTLINK IMAGES ELSEWHERE.

A friend did exactly what you’re suggesting on his instance and Photobucket blocked his server. Let me see if I can find my copy of the email they sent him.

1 Like

not legally a ToS violation imo, but photobucket is of course free to deny service to whomever at their discretion (unless contractually obligated). would be interested in seeing the email. did photobucket get his email address from the contact section of his server or something?

thanks a lot for the warning! when i pull all the photobucket images (on the final migration), i’ll be sure to do it through a proxy or something. not that it really matters though, lol. no one uses photobucket anymore. i also won’t leave the referer spoof in the production server, but hopefully that won’t be needed anyways for new stuff going forward.

Although it wouldn’t be a ToS violation, it could be a DMCA violation (or a violation another copyright law from another country) to circumvent a method that is used to protect the copyright of an image or control access to the original image.

I don’t think that referer header forgery counts as being “a good web citizen”, and if the DMCA indeed applies, it would even be illegal to distribute such software.

Many import scripts have their own code for attachment downloads, have you considered creating a regexp and applying that logic to inline Photobucket images as well - including the referer?

i don’t think copyright violation or circumvention applies in this case because the ToS states that the uploader retains all rights (though perhaps some law is somehow triggered in some territory). however, point taken about respecting standard browsing behavior and being a “good web citizen” – when you put it that way, i think it makes no sense to ship referer spoofing as an official feature.

yes, i should be able to manage to get the images pulled in (without watermark) for my migration. in my case with the phpbb importer, i think it’s easiest to just temporarily modify discourse itself to spoof referer when downloading remote images and do a rebake since the importer doesn’t handle saving remote images (aside from avatars).

i was mostly just wanting to hear from people who have tackled this problem before. i was also curious if it was technically possible to have a plugin do it (still wondering) and if it would even make sense to have discourse officially ship such a feature (it doesn’t).

1 Like

The fact that the uploader retains their rights - including the right to control how it is used and distributed - does not give others the right to modify or remove watermarks from their content without permission.

DMCA also covers access control, which would apply here (i.e. they have the right to serve that image without watermark only if it is surrounded by their ads, and removing that mechanism is accessing the image in another way as they want you to do it)

1 Like

haha, maybe. but consider that something like youtube-dl is still up on github. that makes referer spoofing look quite innocent in comparison, and believe me, RIAA did really try to get it taken down. it was in “the news” for a while if you follow that type of news.

Let me rephrase that: youtube-dl is not still up, it’s back up after it was taken down.

And that’s only because legitimate use cases exist.

That doesn’t mean that all uses are legitimate.

You don’t own rights to the images, recognize that.

i may not own the rights to all of them, but my users do. not sure where you’re going with that, though. they have also expressed a desire that they be preserved against link rot.

1 Like

haha. please understand that i’m not trying to get into a heated argument with you guys. we’re all good here :slight_smile: :heart:. my pedantic take on it though

yeah, but only taken down because of scare tactics. there is a legitimate legal argument against their claims. at any rate, no one pressed on github further, and that says something. entities don’t come much thirstier and meaner than RIAA.

if you consider the EFF’s response as the prevailing legal argument/theory that convinced/emboldened github to reject the RIAA’s (spurious) claims, the real reason is more complicated than that and in fact even rejects the circumvention claim.

surely if legal experts do not believe youtube-dl violates the DMCA 1201(a) circumvention definition, spoofing a referer cannot be considered as such a circumvention.

Not necessarily true of course, but you could at least assume that your users have given you a license at the moment they posted it on your forum, so it would be their problem and not yours. So your use case for this is legitimate, I’m 100% convinced of that.

But no sane open source project would risk getting ex parte taken down because of some legal argument, even if they are / turn out to be right and the complainer is not. So I think the import script is the right place for this.

1 Like