Converting short upload URLs to full URLs

I may be barking up the wrong tree here, so apologies if I am - but any pointers very much welcome!

There are some threads in our Discourse site which are displayed in part on our main website. Because the cooked version of a post contains all the HTML for the lightbox, which we don’t want on the main site, I’m working with the raw version of a post.

One thing that’s tripping me up is the file upload URLs. How can I convert an upload:// URL to a full URL? I’ve tried searching and come across SHA1 and Base62, but apart from that, no matter what I try, I can’t get the full URL.

As I said, I may looking at the wrong thing, or there’s (likely) to be an easier way to these things, so any advice welcome!

Thanks in advance

3 Likes

First base62 decode using the inverted character set, then hex encode the result.

In Python code it looks like this:

rebase = hex(base62.decode(base, base62.CHARSET_INVERTED))[2:].zfill(40)

8 Likes

Thanks for the swift reply @michaeld. Will give it a go later today :slight_smile:


For anyone interested and doing this in PHP, I used a composer library called base62 by tuupola.

This is the code I used:

<?php

$base62 = new Tuupola\Base62(["characters" => Tuupola\Base62::INVERTED]);

/** Set the original file name, excluding any file extensions */
$s = "r3AYqESanERjladb4vBB7VsMBm6";

/** Decode, convert to hex */
$decoded = $base62->decode($s);

/** Expected result: bda2c513e1da04f7b4e99230851ea2aafeb8cc4e */
echo bin2hex($decoded);
2 Likes

Interesting. I am not able to reproduce this with Javascript. Because your last method is a binary2hex conversion but the outcome from the base62 decode is not a binary representation from what I understand.

From a JS perspective I found the following stuff helpful:

function fromBase62(s) {
  var digits = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
  var result = 0;
  for (var i = 0; i < s.length; i++) {
    var p = digits.indexOf(s[i]);
    if (p < 0) {
      return NaN;
    }
    result += p * Math.pow(digits.length, s.length - i - 1);
  }
  return result;
}

Unfortunately neither the bin2hex nor dec2hex methods help, when I use your input strings.

I created a codesandbox for that problem if someone is interested to play around with it. The final value is 1 for my input string which is wrong ;(

Thanks

You should be able to do binVal.toString(16) to get the hexadecimal value.
However, I think(!) the intermediate result is too large to fit in a regular integer.

2 Likes

Thanks for the hint Richard, but not getting any further even with that idea.

I don’t know wouldn’t it be a good feature to have a setting along the lines of “Enable absolute upload links in raw views (routes)”. I would totally use that in our case - because, from my understanding, the upload URL shortening is really just a discourse optimizations to bring down code right? Its not related to Markdown.

Can anyone from the discourse team provide a JS method that can restore the absolute URL?
That would be helpful for the community, think.

I am unable to pull this off ;(

@RGJ do know anyone at the Discourse team to provide that helper function?

No…
I’d suggest you post this in marketplace .

1 Like

You might say more about your use case. Perhaps this isn’t the best solution.

You can convert from base62 to hex, but you will never be able to fully reconstruct the upload URL without some kind of interface to Discourse’s database. The full path to an upload depends on the upload’s id, which cannot be determined from the short url.

As @pfaffman said, we might be able to help more if you describe your use case.

2 Likes