Un problème survient dans notre flux Onebox, je pense qu’il est lié à la redirection/FinalDestination. Si j’utilise curl comme ceci :
curl -L https://youtube.com/shorts/Cs3sTnLO6EE
Je suis capable de trouver les balises title et autres meta dans la réponse :
curl -L https://youtube.com/shorts/Cs3sTnLO6EE | htmlq 'head > meta'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 947221 0 947221 0 0 406174 0 --:--:-- 0:00:02 --:--:-- 504109
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="ApvK67ociHgr2egd6c2ZjrfPuRs8BHcvSggogIOPQNH7GJ3cVlyJ1NOq/COCdj0+zxskqHt9HgLLETc8qqD+vwsAAABteyJvcmlnaW4iOiJodHRwczovL3lvdXR1YmUuY29tOjQ0MyIsImZlYXR1cmUiOiJQcml2YWN5U2FuZGJveEFkc0FQSXMiLCJleHBpcnkiOjE2OTUxNjc5OTksImlzU3ViZG9tYWluIjp0cnVlfQ==" http-equiv="origin-trial">
<meta content="rgba(255, 255, 255, 0.98)" name="theme-color">
<meta content="A Screencast Of LinkedIn Persona Verification Failure" name="title">
<meta content="For https://www.linkedin.com/help/linkedin/cases/73171318#:~:text=Thanks%20for%20contacting%20us%20about,to%20troubleshoot%20any%20additional%20causes." name="description">
...
curl -L https://youtube.com/shorts/Cs3sTnLO6EE | htmlq 'head > title'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 992110 0 992110 0 0 445530 0 --:--:-- 0:00:02 --:--:-- 739941
<title>A Screencast Of LinkedIn Persona Verification Failure - YouTube</title>
Cependant, lorsque j’obtiens la réponse via notre code oneboxer, ce sont les seules balises (à l’exception des balises script + style) que j’obtiens dans le <head> :
uri = FinalDestination.new("https://youtube.com/shorts/Cs3sTnLO6EE", Oneboxer.get_final_destination_options("https://youtube.com/shorts/Cs3sTnLO6EE")).resolve
doc2 = Onebox::Helpers.fetch_response(uri)
Nokogiri.HTML(doc2).css("head").children.each do |headel|
next if headel.name == "script" || headel.name == "style"
puts headel.to_s
end; nil;
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="origin-trial" content="ApvK67ociHgr2egd6c2ZjrfPuRs8BHcvSggogIOPQNH7GJ3cVlyJ1NOq/COCdj0+zxskqHt9HgLLETc8qqD+vwsAAABteyJvcmlnaW4iOiJodHRwczovL3lvdXR1YmUuY29tOjQ0MyIsImZlYXR1cmUiOiJQcml2YWN5U2FuZGJveEFkc0FQSXMiLCJleHBpcnkiOjE2OTUxNjc5OTksImlzU3ViZG9tYWluIjp0cnVlfQ==">
<link rel="shortcut icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon.ico" type="image/x-icon">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_32x32.png" sizes="32x32">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_48x48.png" sizes="48x48">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_96x96.png" sizes="96x96">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_144x144.png" sizes="144x144">
<link rel="stylesheet" href="//fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&family=YouTube+Sans:wght@300..900&display=swap" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="/s/player/65578ad1/www-player.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/desktop/ace6261e/cssbin/www-main-desktop-player-skeleton.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/desktop/ace6261e/cssbin/www-onepick.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/_/ytmainappweb/_/ss/k=ytmainappweb.kevlar_base.dsnGl9m3_bM.L.X.O/am=AAAgAAgk/d=0/rs=AGKMywEVyAGSU99VwQpoLFio5FrCvZ1WpA" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<meta name="theme-color" content="rgba(255, 255, 255, 0.98)">
<link rel="search" type="application/opensearchdescription+xml" href="https://www.youtube.com/opensearch?locale=en_US" title="YouTube">
<link rel="manifest" href="/manifest.webmanifest" crossorigin="use-credentials">
<link rel="canonical" href="undefined">
<link rel="alternate" media="handheld" href="https://m.youtube.com/shorts/Cs3sTnLO6EE">
<link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.youtube.com/shorts/Cs3sTnLO6EE">
<title> - YouTube</title>
<meta name="title" content="">
<meta name="description" content="Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.">
<meta name="keywords" content="video, sharing, camera phone, video phone, free, upload">
<link rel="alternate" href="android-app://com.google.android.youtube/http/www.youtube.com/shorts/Cs3sTnLO6EE">
<link rel="alternate" href="ios-app://544007664/vnd.youtube/www.youtube.com/shorts/Cs3sTnLO6EE">
Je pensais que ignore_redirects pouvait en être la cause, mais modifier les options FD n’a fait aucune différence :
Je suspecte que YouTube réduit le scraping car ils ont le même problème de “AI scraper” que tout le monde, mais je ne suis pas sûr de ce qui cause spécifiquement l’obtention d’une réponse vide (qui semble avoir les balises meta/title remplies via JS, comme lorsque vous visitez dans le navigateur).
Ce code pour le oneboxing YouTube est celui qui s’attend à ce que les balises title + image soient remplies :
Je continue d’enquêter ![]()