Our users have been complaining that Onebox is broken for some sites recently, including New York Times, Washington Post. Did Onebox change recently? See links below. First one is a gift link.
I have recently noticed some odd behaviours on Stable. When I post links from my other discourse instanced(Tests-passed). Sometimes seemingly at random the link doesn’t always onebox.
I haven’t tried posting links from my stable on the tests passed forum.
I have tried rebuilding hmtl with no success getting the link to one box.
I think iirc there is another topic here(on Meta)where I posted a SS.
New York Times and Washington Post have always been paid publications. Though I don’t know if they have done anything recently to change their paywall structure.
Though if I may make a suggestion- if the paywall is the issue, and if one can visually see the article title and caption in the paywalled page, shouldn’t onebox be able to capture those info?
New York Times started paywall 2011. But it allowed some reading times without registration and credit card some times, five if I recall right. At same time it allowed Google’s browsing. Much newer system is blocking access totally and after fighting with Google they shut down free reading totally.
95% sure the onebox already does that. If there’s enough information to display a onebox, it sure will, even if the content is ultimately paywalled.
What I think happens is that the onebox is getting denylisted by these paywalled websites due to recent LLMs crawlers/agents so it doesn’t see the same HTML we might see when using a browser.
Though, happy to get proven wrong. If someone wants to have a quick look to see if they can improve it somehow, pr-welcome