Images from are not auto-crawled


(Anton) #1

Here is an example:

Source code:


This image gets not auto-crawled. Is this a bug?

(Régis Hanol) #2

There’s no file extension. How are we supposed to know that the file is an image?

(Anton) #3

It’s inside an [img] tag, so we should suppose it is an image

(Jeff Atwood) #4

Not possible, all URLs would need to be retrieved and analyzed. Add query string to the end like ?x=.jpg

(Matt Palmer) #5

Why is it not possible to presume, a priori, that a URL in an [img] tag is, in fact, image data? Even if you just did a HEAD request on it and looked at the Content-Type of the response, looking for a media type of image/* (although given that you’ll end up doing a GET for the entire image 99.99% of the time straight after, that’s probably a premature optimisation). HTTP provides Content-Type for a very, very good reason: file extensions are an ugly hack. Worse, if we’re not parsing URLs and separating query string from path, I wouldn’t be surprised if we’re open to some sort of unsafe content injection vulnerabilities…

(Jeff Atwood) #6

Bbcode img tags are not used very often, and shouldn’t be. That is a red herring. So the question becomes:

Why can’t I paste in a random URL not ending in .png or .jpg and have it be detected as an image?

(Matt Palmer) #7

OK, we’ll go with that question. Making a single HEAD request to sniff the content type still doesn’t seem like a massive overhead.

(Sam Saffron) #8

Probably but this is going to have to wait on some oneboxing improvements we have slotted

(Anton) #9

One way or another, I’d like all images to be uploaded and not disappear
one day. Somehow users ended up inserting tens of such images and counting.

So, if not processed automatically, that will be hours for our mods to fix
manually. A fix would be really appreciated.

(Matt Palmer) #10

I think the “PR welcome” tag fits here quite nicely…