Spotify专辑分享的内嵌onebox标题不佳

内嵌 onebox 的标题不佳(我并不是说这是我们的错):

内嵌:Spotify

来源:

https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS?si=FurcwGpfSrWJBPGwKeCvDQ

内嵌:https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS?si=FurcwGpfSrWJBPGwKeCvDQ

图片(预览):

图片(已烘焙):

4 个赞

我们需要更新我们爬虫使用的用户代理中的浏览器版本吗?看起来我们正在使用 Safari 14?

在此 Safari 版本中手动访问该 URL:

我们的用户代理不一致。

对于完整的 onebox,我们使用:

Discourse Forum Onebox v3.5.0.beta9-dev

Spotify 对此没问题,并提供完整页面:

○ → curl -s --user-agent 'Discourse Forum Onebox v3.5.0.beta9-dev' 'https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS?si=FurcwGpfSrWJBPGwKeCvDQ' | htmlq 'meta[property^="og:description"], meta[property^="og:site_name"], title'
<title>The First Symphony - Album by Indecent | Spotify</title>
<meta content="Spotify" property="og:site_name">
<meta content="Indecent · Album · 2024 · 12 songs" property="og:description">
<meta content="Spotify" property="og:site_name">

但对于内联 onebox,我们使用:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15

所以这最终导致了我们行为不同的原因,尽管我仍然想问为什么我们在这里不使用 opengraph 属性?尽管是一个不受支持的浏览器,Spotify 仍然提供这些:

○ → curl -s --user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15' 'https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS?si=FurcwGpfSrWJBPGwKeCvDQ' | htmlq 'meta[property^="og:description"], meta[property^="og:site_name"], title'
<title>Unsupported browser</title>
<meta content="Spotify" property="og:site_name">
<meta content="Indecent · Album · 2024 · 12 songs" property="og:description">
<meta content="Spotify" property="og:site_name">
all properties
○ → curl -s --user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15' 'https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS?si=FurcwGpfSrWJBPGwKeCvDQ' | htmlq 'meta[property^="og:"]'
<meta content="Spotify" property="og:site_name">
<meta content="The First Symphony" property="og:title">
<meta content="Indecent · Album · 2024 · 12 songs" property="og:description">
<meta content="https://open.spotify.com/album/1nrWysWdrgPR0kBJ45z2aS" property="og:url">
<meta content="music.album" property="og:type">
<meta content="Spotify" property="og:site_name">
<meta content="AR" property="og:restrictions:country:allowed">
…
<meta content="XK" property="og:restrictions:country:allowed">
<meta content="https://i.scdn.co/image/ab67616d0000b273ff9434b9650f38d183e91fb1" property="og:image">
2 个赞

嗯:

因为在我们的实现中,<title> 的优先级高于 open graph,而且我们不想硬编码字符串“不支持的浏览器”……

FIX: update final destination to use more recent user agent by SamSaffron · Pull Request #34207 · discourse/discourse · GitHub 应该可以解决这个问题……

不过这感觉很老了:

DEFAULT_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15"

嗯……现在 Spotify 给我们提供了“Spotify”这个标题 :slight_smile: 如果我们想获取专辑名称,我们就需要自定义的“Spotify”代码 :frowning:

2 个赞