Most of links are crawled correctly except the Amazon ones. Here is an example:

After reading the source code, it’s the role of the CrawlTopicLink job. Can somebody have a look?

@eviltrout I remember you saying you had issues retrieving the title from amazon pages and that you made a special case for it. Does that special case take into account only the .com version?

Let’s try it:

EDIT : so, yes, the problem appears only with .fr links, not .com

How about

same problem with .de links…

Found !

I guess you now have enough for a pull request :wink:

done! How can we refresh all the links already posted?

That’s a good question. @eviltrout, will a rebake trigger the CrawlTopicLink job?

I suspect not, as the job is enqueued after the links are saved, and I believe there is an intelligent diff to not save links that have not changed on save.

We could have the rebake enqueue a Jobs::CrawlTopicLink for each link after it’s done maybe?

For the diff, you can use the same regexp as the job.
Good idea for the rebake!

