Unable to onebox Amazon.co.uk (but amazon.com works)

Hi everyone,

Running Discourse v2.4.0.beta6+119

We are unable to onebox links to products on amazon.co.uk

eg (this works here btw): https://www.amazon.co.uk/BG-Electrical-NBS22G-Brushed-Switched/dp/B004TRJYE8

We can onebox links on amazon.com (and other Amazon TLDs) with no issues.

The browser console shows a 404 Not Found error:

I tried the vimeo IP blocklist test in case Amazon have also blacklisted a bunch of Digital Ocean IP addresses, but strangely I can wget the link just fine from the server directly:

xx@xx:~# wget https://www.amazon.co.uk/BG-Electrical-NBS22G-Brushed-Switched/dp/B004TRJYE8
--2019-10-23 14:49:47--  https://www.amazon.co.uk/BG-Electrical-NBS22G-Brushed-Switched/dp/B004TRJYE8
Resolving www.amazon.co.uk (www.amazon.co.uk)... 99.86.105.85
Connecting to www.amazon.co.uk (www.amazon.co.uk)|99.86.105.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: âB004TRJYE8â

B004TRJYE8                                            [   <=>          ] 711.64K   682KB/s    in 1.0s

2019-10-23 14:49:48 (682 KB/s) - âB004TRJYE8â saved [728716]

Any suggestions for what I could try next? :thinking:

2 Likes

In my experience, Amazon will ban you real quickly if you are hitting them a lot. It’s automated, too.

Our members have been finding this a lot lately, especially as we are affiliated with Amazon UK and Amazon France.

Being able to onebox and show suggested products relevant to the discussion obviously helps us as our forum funds increase when an item is purchased using the affiliation link.

But the onebox now not showing is crazy and not helping anyone. I have since found this thread from @merefield

Be careful with Oneboxing Amazon.

I was ejected from their programme for that ‘transgression’.

I use the official image links they provide and generate in their toolbar instead. You can set them within table markdown to make them look a bit better. It’s a little more work, of course, but it really depends on how often you need to do it.

As I said in my original Topic, one big advantage of that is they are serving these.

4 Likes

Thanks Robert warning taken on board and passed on.

1 Like

Legally it’s against their terms, but I do agree with Jeff, the time I was “punished” was probably down to a finickity reviewer having a bad day. (and probably not familiar with Discourse).

Btw, Amazon UK has rejected me twice for ‘being a forum’ (not mentioned anywhere in their rules!). The US has no issue. It’s incredibly frustrating how they seem to observe different rules depending on the locale.

3 Likes

‌

As with Amazon’s Seller support it all depends on who receives your query or in this case application as to what response you get!

The Amazon’s Associates Program Operating Agreement has recently been updated 6 Sept 2019 (uk) and 1st October on .com.

As always they have the “we reserve the right to modify change any part of the agreement terms and conditions” ( section 13) and if you don’t like it…tough, leave (section 6) …they are such a joy to work with!

https://affiliate-program.amazon.co.uk/help/operating/agreement

2 Likes

We are not Amazon affiliates, nor are any of our members who post links to Amazon products on our Discourse.

Our members sometimes post links to products on Amazon which may be of interest to our community but it won’t be that many, I would say it’s as low an average as one Amazon link per day. So we’re not really abusing the system.

I’m still not quite sure why I can wget the URL directly from the server without an issue though :thinking:

Is there anything else I could try or test? Any caches I could flush or processes I could restart which may ‘refresh’ something?

The plot thickens / confusion continues…

Any idea what’s going on here? :thinking:

The first URL oneboxes, albeit with a Robot Check message, does that reveal anything to anyone?

The second URL does not onebox at all.

URL in question: https://www.amazon.co.uk/dp/B0791RGQW3/

:man_shrugging:

Incidentally, the Robot Check message does not appear here on meta, just a blank onebox:

Robot Check means you are being blocked as a bot.

2 Likes

Not good :confused:

As I can wget the page from the server itself ok, it’s not an outright IP block like Vimeo use so do we know how they’re performing this check?

Any tips for a workaround? :thinking:

You are somehow being identified as an undesirable. It is very common for VP servers to be blocked from scraping. You need a proxy crawling service.

I support one in my plugin: onebox assistant

1 Like

Thanks for the suggestion, a quick glance suggests that I need to subscribe to a paid-for service in order for that to work. Any other workarounds I could use?

Curious as to how Amazon know that Discourse is making the request and blocking it on the fly, as the wget route on the same server still works fine.

Probably the User Agent string.

3 Likes

What does Discourse present as its User-Agent?

Is this something I could spoof to make out it’s a regular Firefox browser or something? :thinking:

3 Likes