Probably not, but we could shorten to image://sha1.png quite easily, which helps
I’s say, it worth to use term “random bit length” instead of “algorythm”, because any good hash function will be ok. Usually, 128 bits should be ok, but very big hostings (like “yandex photos”) switched to 512 bits. Also, timestamp can be used to reduce random part (see mongodb algorythm for ObjectID).
Another way to save ~25% is use of base58 encoding or similar (base64 is not url-friendly).
Quick estimate: 128 bits + base64 => 128/6 => 22 characters.
That is correct, base64 encoding of 128 bits GUID should produce 22 chars:
You can do ASCII85 encoding to achieve 20 bytes but the characters used are no longer filesystem friendly and that does not seem worth the additonal pain:
Compare with what we currently have @sam …
… quite the difference, we could lop off 16 chars there.
8 posts were merged into an existing topic: Is using a GUID for an uploaded filename secure?
So far, only @sam’s
image://sha1.png sounds good to me. And I don’t feel any strong love for that. Why do you care how clean or ugly a filename a user doesn’t have to type in themselves is?
All the other ideas seem like more complexity for little gain. The whole idea of using a hash is that identical uploads can easily be optimized for disk space. Changing to some other ID doesn’t achieve that. You could store and save the SHA1 in the background and give out the same GUID for it, but there’s that complexity.
(And using MD5 is bad since there’s a non-zero chance someone will want to post deliberate collisions. But as easily forced SHA1 collisions will probably be upon us soon, too, maybe the complexity would be good?)
I think the encoding difference is pretty obvious, it’s almost 2x as large using hex encoding vs. base64 encoding, for the same 128-bit GUID.
Google Photos also switched to base64 (?) encoding some time ago for shared photos… Shorter URLs, but still as secure as the long ones.
I know the technical reasons so I’m not expecting an easy solution, but I do find it sad that in such forward-looking software as Discourse, computerese like this is visible to the user at all.
is neither here nor there, really.
It is 80% file name though. So that is the reality. Count the chars!
The hostname is 41 chars, and the image ID 40 chars in the first example. The full URL from
.png is 104 chars. I bet you could shorten that URL a lot with a suitable CNAME. Register a WX.YZ host and assign two or three character labels there:
44 versus 11.
(Of course, Czechoslovakia doesn’t register hostnames anymore.)
I already mentioned I can make image:// work
is 134 characters, of which
is a whopping 105 characters. That means 78.3% of the string is the filename Without the filename altogether, the markup is
with the smallest possible URL path it would be 78 characters
which is a reduction of 57% in size!
That assumes, however, that:
- we can map to a tiny 2-char domain
- we can use
originalin the path
- we can switch to base64 encoding instead of hex encoding for the GUID in the filename
works without any changes to storage, and a fairly straight forward change to the md pipeline
No magic domains needed.
Nice, let’s make that so when you are back, then!
@sam is working on this now so we should be able to get
which I, at least, view as a substantial quality of life improvement for day to day image handling in posts.
(note that we’ll need to use base62, to avoid the
+ chars in URLs though)
Just use URL-safe base64 - uses
- for 62 and
_ for 63. RFC 4648 - The Base16, Base32, and Base64 Data Encodings base64 - GoDoc
Cool up to @sam how he wants to handle it, then. Not a huge fan of
_ both (potentially) being in URLs though.
This is now implemented, I opted for base62 cause it is super friendly on the eyes. I could have allowed
* or some other stuff in the URL to shorten it some more and bring it up to base64 but honestly I feel base62 is good enough ™