I have CSS tags on my content and have identified the CSS Selectors in the embedding admin settings but they are not being picked up by the crawler. I believe it is because my site is built with React and the css selectors are not accessible. How can I use this feature with a React site?
That may be the case although I’d expect the request that Discourse makes to the site to return HTML. Something else to look into is that Discourse caches the content that it’s pulled from the external site for 10 minutes. That means that if you are adjusting the allowed embed selectors
site setting, you’ll have to wait up to 10 minutes to see the results of the new setting value.
If your Discourse site isn’t yet in production and you have access to the Discourse site’s Rails console, you can clear the cache from the console with Rails.cache.clear
Some additional details about the setting are here: Configure the Allowed Embed Selectors Setting.
Edit: I think that embedding is working as expected. It’s just tricky to configure for some sites. I’m going to move this topic to support.
I created a feature topic a few months ago suggesting that Discourse should find a better way of parsing embedded posts: Topic embedding needs some love. I’ll put some time into that soon if no one gets to it before me.
Discourse presents a different view to crawlers. To see it you’ll need to visit with javascript turned off.
I would not expect embedded posts to be crawled since they do not exist on the site where they are embedded.
Hello, I’m referring to how Discourse crawls my React site with the embedded posts feature. I’m having a hard time telling it what content to include on the Discourse post because of how it sees my React page.
It it helps, the code that’s used for pulling in the external post is here: discourse/app/models/topic_embed.rb at main · discourse/discourse · GitHub
If you have access to your site’s Rails console, you can test it to get some idea of what’s going on. For example:
TopicEmbed.find_remote("https://blog.discourse.org/2023/04/introducing-discourse-ai/")
Where I’ve run into problems is with the parse_html
method that’s called at the end of the find_remote
method. Ruby Readability struggles with some HTML/CSS.
It’s also possible that HTML isn’t being returned from your React site. You could confirm that by running the code that’s in the find_remote
method in steps from the console.
This is excellent, thank you!