C’è qualcosa che non va nel nostro flusso Onebox, credo a causa del reindirizzamento/FinalDestination. Se uso curl in questo modo:
curl -L https://youtube.com/shorts/Cs3sTnLO6EE
Sono in grado di trovare i tag title e altri meta nella risposta:
curl -L https://youtube.com/shorts/Cs3sTnLO6EE | htmlq 'head > meta'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 947221 0 947221 0 0 406174 0 --:--:-- 0:00:02 --:--:-- 504109
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="ApvK67ociHgr2egd6c2ZjrfPuRs8BHcvSggogIOPQNH7GJ3cVlyJ1NOq/COCdj0+zxskqHt9HgLLETc8qqD+vwsAAABteyJvcmlnaW4iOiJodHRwczovL3lvdXR1YmUuY29tOjQ0MyIsImZlYXR1cmUiOiJQcml2YWN5U2FuZGJveEFkc0FQSXMiLCJleHBpcnkiOjE2OTUxNjc5OTksImlzU3ViZG9tYWluIjp0cnVlfQ==" http-equiv="origin-trial">
<meta content="rgba(255, 255, 255, 0.98)" name="theme-color">
<meta content="A Screencast Of LinkedIn Persona Verification Failure" name="title">
<meta content="For https://www.linkedin.com/help/linkedin/cases/73171318#:~:text=Thanks%20for%20contacting%20us%20about,to%20troubleshoot%20any%20additional%20causes." name="description">
...
curl -L https://youtube.com/shorts/Cs3sTnLO6EE | htmlq 'head > title'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 992110 0 992110 0 0 445530 0 --:--:-- 0:00:02 --:--:-- 739941
<title>A Screencast Of LinkedIn Persona Verification Failure - YouTube</title>
Tuttavia, quando ricevo la risposta tramite il nostro codice oneboxer, questi sono gli unici tag (esclusi script e style) che ottengo nell’<head>:
uri = FinalDestination.new("https://youtube.com/shorts/Cs3sTnLO6EE", Oneboxer.get_final_destination_options("https://youtube.com/shorts/Cs3sTnLO6EE")).resolve
doc2 = Onebox::Helpers.fetch_response(uri)
Nokogiri.HTML(doc2).css("head").children.each do |headel|
next if headel.name == "script" || headel.name == "style"
puts headel.to_s
end; nil;
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="origin-trial" content="ApvK67ociHgr2egd6c2ZjrfPuRs8BHcvSggogIOPQNH7GJ3cVlyJ1NOq/COCdj0+zxskqHt9HgLLETc8qqD+vwsAAABteyJvcmlnaW4iOiJodHRwczovL3lvdXR1YmUuY29tOjQ0MyIsImZlYXR1cmUiOiJQcml2YWN5U2FuZGJveEFkc0FQSXMiLCJleHBpcnkiOjE2OTUxNjc5OTksImlzU3ViZG9tYWluIjp0cnVlfQ==">
<link rel="shortcut icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon.ico" type="image/x-icon">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_32x32.png" sizes="32x32">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_48x48.png" sizes="48x48">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_96x96.png" sizes="96x96">
<link rel="icon" href="https://www.youtube.com/s/desktop/ace6261e/img/favicon_144x144.png" sizes="144x144">
<link rel="stylesheet" href="//fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&family=YouTube+Sans:wght@300..900&display=swap" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="/s/player/65578ad1/www-player.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/desktop/ace6261e/cssbin/www-main-desktop-player-skeleton.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/desktop/ace6261e/cssbin/www-onepick.css" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<link rel="stylesheet" href="https://www.youtube.com/s/_/ytmainappweb/_/ss/k=ytmainappweb.kevlar_base.dsnGl9m3_bM.L.X.O/am=AAAgAAgk/d=0/rs=AGKMywEVyAGSU99VwQpoLFio5FrCvZ1WpA" nonce="kFtYVVw9wWKkoPdOJkO9xQ">
<meta name="theme-color" content="rgba(255, 255, 255, 0.98)">
<link rel="search" type="application/opensearchdescription+xml" href="https://www.youtube.com/opensearch?locale=en_US" title="YouTube">
<link rel="manifest" href="/manifest.webmanifest" crossorigin="use-credentials">
<link rel="canonical" href="undefined">
<link rel="alternate" media="handheld" href="https://m.youtube.com/shorts/Cs3sTnLO6EE">
<link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.youtube.com/shorts/Cs3sTnLO6EE">
<title> - YouTube</title>
<meta name="title" content="">
<meta name="description" content="Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.">
<meta name="keywords" content="video, sharing, camera phone, video phone, free, upload">
<link rel="alternate" href="android-app://com.google.android.youtube/http/www.youtube.com/shorts/Cs3sTnLO6EE">
<link rel="alternate" href="ios-app://544007664/vnd.youtube/www.youtube.com/shorts/Cs3sTnLO6EE">
Pensavo che ignore_redirects potesse essere la causa, ma modificare le opzioni di FD non ha fatto alcuna differenza:
Sospetto che YouTube stia probabilmente riducendo lo scraping perché sta avendo lo stesso problema di scraping AI di tutti gli altri, ma non sono sicuro di cosa stia causando specificamente la ricezione della risposta vuota (che sembra ottenere i tag meta/title popolati tramite JS allo stesso modo di quando si visita nel browser).
Questo codice per il oneboxing di YouTube è ciò che si aspetta che i tag title e image vengano popolati:
Continuerò a indagare :occhi: