You can use an HTML parser, such as Nokogiri, to take apart the HTML document, and quickly find the <script>
tag you're after. The content inside a <script>
tag is text, so Nokogiri's text
method will return that. Then it's a matter of selectively retrieving the lines you want, which can be done by a simple regular expression:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<head>
<script>
PlayerWatchdog.init({
'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
'playerdata': 'http://www.example.com/player',
'pageurl': 'http://www.example.com?test=2',
});
</script>
</head>
</html>
EOT
script_text = doc.at('script').text
playerurl, playerdata, pageurl = %w[
playerurl
playerdata
pageurl
].map{ |i| script_text[/'#{ i }': '([^']+')/, 1] }
playerurl # => "http://cdn.static.de/now/player.swf?ts=2011354353'"
playerdata # => "http://www.example.com/player'"
pageurl # => "http://www.example.com?test=2'"
at
returns the first matching <script>
Node instance. Depending on the HTML you might not want the first matching <script>
. You can use search
instead, which will return a NodeSet, similar to an array of Nodes, and then grab a particular element from the NodeSet, or, instead of using a CSS selector, you can use XPath which will let you easily specify a particular occurrence of the tag desired.
Once the tag is found, text
returns its contents, and the task moves from Nokogiri to using a pattern to find what is desired. /'#{ i }': '([^']+')/
is a simple pattern that looks for a word, passed in in i
followed by : '
then capture everything up to the next '
. That pattern is passed to String's []
method.