I have some video feeds available on a website, which I would like to open in XBMC, but can't.
So I was thinking of scraping the links and channel name and output them to some files which my mediacenter can open (one file per channel). I must be done on a small linux box and since I don't know bash nor python but some php (not much), I figured I'd use PHP for the task. But I've run into some problems with regex and the output from php.
The website containing the feeds looks something like this:
... Lots of HTML before this part
<a href="javascript:changeChannel('http://live.provider.com/something/something_else/1.abcdefg.m3u8', 1);">First Channel</a><br>
<a href="javascript:changeChannel('http://live.provider.com/something/something_else/2.abcdefg.m3u8', 2);">Second Channel</a><br>
<a href="javascript:changeChannel('http://live.provider.com/something/something_else/3.abcdefg.m3u8'', 3);">Third Channel</a><br>
.... // More channels and other html below here..
What I want to extract is the link and the url text:
Ex: http://live.provider.com/something/something_else/1.abcdefg.m3u8
Ex: First Channel
etc.
Currently my code looks like this:
$streamSite = "http://link.to/feed-website.html";
function writeFile($url, $channel) {
$File = $channel.".strm";
$Handle = fopen($File, 'w');
fwrite($Handle, $url);
fclose($Handle);
}
$input = @file_get_contents($streamSite) or die("Could not access file: $url");
$regexp = "(((f|ht){1}tp:\/\/)[-a-zA-Z0-9@:%_\+.~#?&\/\/=]+)";
if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
echo serialize($match);
echo "\r\n";
}
unset($match);
}
?>
With the current regex I was supposed to scrape the url. I've tested the regex on http://regexr.com/ and it works there.
At the moment I'm just printing the result to console.
The current output looks like this:
a:3:{i:0;s:97:"http://live.provider.com/something/something_else/1.abcdefg.m3u8";i:1;s:7:"http://";i:2;s:2:"ht";}
a:3:{i:0;s:97:"http://live.provider.com/something/something_else/2.abcdefg.m3u8";i:1;s:7:"http://";i:2;s:2:"ht";}
a:3:{i:0;s:97:"http://live.provider.com/something/something_else/3.abcdefg.m3u8";i:1;s:7:"http://";i:2;s:2:"ht";}
I can't figure out where the text before and after the links comes from. Is it my serializing that fails or is it the regex?
Could you help my with the regex, so I can scrape the url and the text and put it into an array which I can loop through afterwards and write the content to a .strm file using the function I've written?
Thanks in advance!