0

I have a url like this:

http://r16---sn-4g57kn6e.googlevideo.com/videoplayback?&quality=medium&signature=797C0FEB1961E6226294D5FC19BC0CD28657975C.1E745D852200D14B706F0EBF9EA8762680374564&itag=43&mv=m&ip=84.19.165.220&ipbits=0&ms=au&ratebypass=yes&source=youtube&mt=1390347607&id=8b92b07ff9cd9862&key=yt5&fexp=942502,916626,929305,936112,924616,936910,936913,907231,921090&upn=cMPazwtmyZU&sver=3&sparams=id,ip,ipbits,itag,ratebypass,source,upn,expire&expire=1390371882&type=video%2Fwebm%3B+codecs%3D%22vp8.0%2C+vorbis%22&fallback_host=tc.v12.cache5.googlevideo.com&title=Requiem+For+A+Dream+Original+Song&title=Requiem For A Dream Original Song

The problem is that the readfile() function produces a error cause of the special characters (bad request).

If i use urlencode() it ruins the url even more.

How can i handle this?

TheNiceGuy
  • 3,462
  • 8
  • 34
  • 64
  • 1
    Awesome track BTW. Now I have to drink myself to sleep again though – PeeHaa Jan 21 '14 at 23:57
  • 2
    This may help [PHP readfile() of external URL](http://stackoverflow.com/questions/4752389/php-readfile-of-external-url) :) – PsychoMantis Jan 21 '14 at 23:58
  • Oh yeah it is. I listen to it since 4 hours (repeat function). It's very good while coding :D – TheNiceGuy Jan 21 '14 at 23:58
  • Nope that does not help. My question has nothing todo with that persons one. – TheNiceGuy Jan 21 '14 at 23:59
  • There is an awful lot of repetition in your URL. Have you tried to make it more compact? – Floris Jan 21 '14 at 23:59
  • How should i do that? What you mean with "more compact?" This values are required otherwise the link becomes invalid. That's YouTUbe lol – TheNiceGuy Jan 22 '14 at 00:00
  • 1
    Don't urlencode the whole string, just the song title? Have you tried using file_get_contents instead? – Dan H Jan 22 '14 at 00:03
  • I encoded everything except the http://. Yes file_get_contents produces exactly the same problems. I also removed the title for testing purposes and got the same result. – TheNiceGuy Jan 22 '14 at 00:04
  • @Michael You're encoding it wrong. As we have all said, do not encode everything. The point of URL-encoding is that you can use an arbitrary string in the context of a URL. – Brad Jan 22 '14 at 00:14
  • How should i do that? I can't manually create url. I need to work with the URL that i get (like in the above example). – TheNiceGuy Jan 22 '14 at 00:17
  • The issue here is that you can't scrape Youtube using file_get_contents or cURL for that matter. At the very least, you would need to create a `bash script` to scrape the video from Youtube. This type of protection is in place for a particular reason and you'll never achieve the proper result attempting this. – Ohgodwhy Jan 22 '14 at 00:24
  • That is not true. Once again: The problem is the format of the URL, nothing else! – TheNiceGuy Jan 22 '14 at 00:24

2 Answers2

1

You simply need to urlencode() the data in you use in the querystring. In your post, you have not escaped the last variable. Do not urlencode() the whole URL.. that would not be proper.

http://r16---sn-4g57kn6e.googlevideo.com/videoplayback?&quality=medium&signature=797C0FEB1961E6226294D5FC19BC0CD28657975C.1E745D852200D14B706F0EBF9EA8762680374564&itag=43&mv=m&ip=84.19.165.220&ipbits=0&ms=au&ratebypass=yes&source=youtube&mt=1390347607&id=8b92b07ff9cd9862&key=yt5&fexp=942502,916626,929305,936112,924616,936910,936913,907231,921090&upn=cMPazwtmyZU&sver=3&sparams=id,ip,ipbits,itag,ratebypass,source,upn,expire&expire=1390371882&type=video%2Fwebm%3B+codecs%3D%22vp8.0%2C+vorbis%22&fallback_host=tc.v12.cache5.googlevideo.com&title=Requiem+For+A+Dream+Original+Song&title=Requiem For A Dream Original Song

I would just use http_build_query() instead.

echo 'http://r16---sn-etc' . http_build_query(array(
    'ip' => 84.19.165.220,
    'ipbits' => 0,
    // etc.
    'title' => 'Requiem for a Dream'
));
Brad
  • 159,648
  • 54
  • 349
  • 530
  • Sorry i can't follow you. Can you please post a example? As i said above i encoded everything except the http://. Even without the titel it is not working. – TheNiceGuy Jan 22 '14 at 00:11
  • @Michael Right, you're encoding more than you should be. – Brad Jan 22 '14 at 00:12
  • @Michael see my updated answer for an "automatically built" url / query string. And Brad : +1 for pointing to `http_build_query()`. – Floris Jan 22 '14 at 04:55
  • @Michael Of course you can build the URL yourself... And, you needn't build the entire thing, just the parts you need. Leave the rest as-is. – Brad Jan 22 '14 at 13:45
0

Combining things I had in my original answer, plus some of the ideas in Brad's answer, I offer the following solution

<?php
$url='http://r16---sn-4g57kn6e.googlevideo.com/videoplayback?&quality=medium&signature=797C0FEB1961E6226294D5FC19BC0CD28657975C.1E745D852200D14B706F0EBF9EA8762680374564&itag=43&mv=m&ip=84.19.165.220&ipbits=0&ms=au&ratebypass=yes&source=youtube&mt=1390347607&id=8b92b07ff9cd9862&key=yt5&fexp=942502,916626,929305,936112,924616,936910,936913,907231,921090&upn=cMPazwtmyZU&sver=3&sparams=id,ip,ipbits,itag,ratebypass,source,upn,expire&expire=1390371882&type=video%2Fwebm%3B+codecs%3D%22vp8.0%2C+vorbis%22&fallback_host=tc.v12.cache5.googlevideo.com&title=Requiem+For+A+Dream+Original+Song&title=Requiem For A Dream Original Song';
$cleanUrl = parseQuery($url);
$data = getData($cleanUrl);
echo "file read in OK\n";

function parseQuery($url) {
  preg_match('/(https?:\/\/[^?]+\?)(.*)$/', $url, $rawQuery);
  preg_match_all('/([^=]+)=([^&]+)&/', $rawQuery[2], $queries);
  $qArray = array_combine($queries[1], $queries[2]);
  $newUrl = $rawQuery[1] . http_build_query($qArray);
  return $newUrl;
}

function getData($url) {
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_USERAGENT, $useragent);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
?>

This basically involves the following steps:

  1. Take the initial URL, and split it into "stuff before the ?, and stuff after"
  2. The "stuff before" is untouched; the "stuff after" is split into two array - the keys and the values of the query ("everything up to =", and "everything up to &")
  3. These two arrays are then combined into a valid query string using (from Brad's answer) the http_build_query array
  4. Finally, I use curl to fetch the file (just because I know it better than readfile()).

It appears to work for me. Let me know if it doesn't work for you...

Floris
  • 45,857
  • 6
  • 70
  • 122
  • No it has nothing to do with the useragent. The problem is the format of the url. `curl_exec(): 23 is not a valid cURL handle resource` – TheNiceGuy Jan 22 '14 at 00:14