Issues with embedding from RSS

Hi Simon

I don’t think this has been merged yet.
I’ve been trying again for hours … it just won’t work.
I have the forum set for no uncategorised categories yet the RSS feeds all go into uncategorised no matter what I do.

Could you add the category on the RSS Polling page?
Feed URL - Author - Class - Category . and the embedding part done behind the scenes?

I don’t understand

The domain of the feeds link attributes?
I thought I was selecting the category from the dropdown?

But I put the full URL of the feed in the RSS Polling page?
It seems I’m entering the same information into two different places in different formats and they are not matching.
I’m not seeing the purpose of the allowed hosts / whitelist path.

I just had a look and see that the PR hasn’t been merged. I’ll get someone to take a quick look at my changes and get them merged into the core code. Based on your questions, I’m not sure that the explanation I added to the plugin of how to set the feed topic’s categories will be clear though. I’ll try explaining it again here.

The category that the RSS feed topics gets published to is based off the domain of the feed’s link attributes, it is not based off the domain of the feed. For example, if your feed is at https://example.com/feed, but the link attributes in your feed are for posts at https://www.example.com/, the domain that you need to add to your Admin / Customize / Embedding hosts section is www.example.com, not example.com. The example below would cause all posts from the feed to be published to the “fun” category:

To find the value of your feed’s link attributes, you will need to look at the markup of your feed. You can do that by loading the feed URL in your browser.

It allows you to publish posts from a specific path on your blog to a Discourse category. For example:

This would publish all posts in the www.example.com/fun path to my “fun” category and all posts from the www.example.com/support path to the “Customer support” category.

1 curtida

I don’t understand why I’m on the embed page at all.
The complete feed URL is already input.
Can’t the plugin slice and dice the URL into Allowed Hosts and Path Whitelist without my inputing it all twice?

The example input on the RSS Polling plugin and the Embed page don’t correlate.
‘feeds’ is a subdomain on one … then later a path?

I’ve been trying with a few RSS feeds.
This one … BBC Health - BBC Health - Admin user

allowed hosts - feeds.bbci.co.uk
path whitelist - /health/.*
into a Health category

I think the above should work, but it doesn’t.
I’ve tried every possible combination for hours now.

I agree that configuring an RSS feed is fairly difficult. Some of the issues are related to our having moved the RSS feed code out of the core Discourse code into a plugin. There are not many sites that I know of that are using Discourse’s RSS feed functionality.

To test things out, I configured the feed at https://feeds.bbci.co.uk/health/rss.xml on my site. Here is how that setup looks:

When I first set that up, all feed topics were automatically published to my Uncategorized category. To fix that, I visited a couple of the topics that had been created to look at this section of the post:

What that is telling me is that the URL of the post is at http://www.bbc.co.uk/news/uk-politics-21668349#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa. The domain of the post is www.bbc.co.uk. I then added this domain as an Allowed Host on my Embedding page. I set the path /news/.* to publish to my “fun” category:

I then deleted the first batch of topics that were created by the feed. Discourse is pulling them in again. They are now being published to the correct category.

Hi I’ve finally worked it out … sorry.
I think the BBC feed was a bad place to start.

I understand your terminology now.
I didn’t realise you were referring to links from within the RSS feed.
I thought the feed URL was enough.

2 curtidas

Yes, that’s probably the most difficult one that I have seen to work out. The links are redirected by the BBC servers and the feed cannot be viewed directly in the browser. The only way I could find to get the correct embed domain was to first publish the posts to my site. Generally it is a lot easier than this to configure the plugin.

2 curtidas

Unfortunately the BBC health feed pulls from lots of different paths.
One just now on tropical medicine from the history section. so I need /history/.* for my Health feed too.
But this will work if I only add the one BBC feed.
So thank you very much for your time and patience. :clap: :clap:

1 curtida

Retomando um tópico bastante antigo para agradecer pela ótima explicação. Infelizmente, isso me fez entender que o plugin provavelmente não funcionará para minha necessidade. Estou tentando incorporar um feed gerado por uma instância do Shaarli, mas o atributo de link em cada entrada aponta para um domínio totalmente diferente (já que o Shaarli é uma ferramenta de arquivamento de favoritos). Suponho que não haja como usar um curinga no caminho (permitindo assim que qualquer item de feed de entrada seja direcionado a uma categoria específica), correto?

Ops, esqueça. Encontrei uma maneira de gerar um feed diferente da fonte no Shaarli. Resolvido para as minhas necessidades.

2 curtidas

Como escrevi aqui há algumas semanas, usar um feed diferente resolveu o problema do permalink. Mas agora o plugin não está buscando o conteúdo de cada item.

O tópico é criado com o título correto, pelo usuário certo e arquivado na categoria apropriada, conforme configurado. No entanto, o corpo do tópico diz algo (desculpe a imprecisão, estou usando o locale em português) como “esta é uma discussão relacionada a esta mensagem original” e aponta para a URL correta.

Em seguida, há um botão rotulado “mostrar mensagem completa”. Quando eu clico nele, ele fica “carregando” por um tempo indeterminado. Eu entendia que, se eu voltasse ao tópico uma segunda vez, o cache já teria sido criado, mas isso não parece ser o caso.

Exemplo:
https://rede.tropixel.org/t/greentech-alliance/418

A fonte do feed é esta:

Qualquer dica é bem-vinda.

Acho que o problema é que o Discourse não está encontrando o conteúdo que está na página em https://links.efeefe.me/?xZVQww. Há muito pouco texto na página. Quando o botão “Mostrar postagem completa” é clicado, o Discourse tenta fazer o scraping da página para obter seu conteúdo principal. Se as páginas tiverem muito pouco conteúdo, você pode ajudar a identificar o conteúdo principal da página configurando a configuração do site embed whitelist selector do Discourse. Há detalhes sobre como fazer isso aqui: Configuring allowed embed selectors.

1 curtida

Obrigado. Tentei isso (no meu caso, permitindo a lista de permissões para “linklist-item-description”), mas não houve nenhum efeito, mesmo aguardando a atualização do cache e adicionando novas entradas à fonte RSS.

Tente .linklist-item (Observe o . no início do nome da classe. Ele precisa ser incluído.)

Você também pode tentar .linklist-item .linklist-item-title, .linklist-item .linklist-item-description

Você precisará esperar até 10 minutos para ver as alterações. Se tiver acesso ao console Rails do seu site Discourse, pode limpar o cache executando Rails.cache.clear. Assim, você poderá ver as alterações imediatamente.

2 curtidas