0

Lets go directly with an example since it is not easy to explain:

<li id="l_f6a1ok3n4d4p" class="online"> <div class="link"> <a href="javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com');%20" onclick="visited('f6a1ok3n4d4p');" style="float:left;">random strings - 4</a> <a style="float:left; display:block; padding-top:3px;" href="http://www.webtrackerplus.com/?page=flowplayerregister&amp;a_aid=&amp;a_bid=&amp;chan=flow"><img border="0" src="/resources/img/fdf.gif"></a> <!-- a class="none" href="#">random strings - 4  site2.com - # - </a --> </div> <div class="params"> <span>Submited: </span>7 June 2015  | <span>Host: </span>site2.com </div> <div class="report"> <a title="" href="javascript:report(3191274,%203,%202164691,%201)" class="alert"></a> <a title="" href="javascript:report(3191274,%203,%202164691,%200)" class="work"></a> <b>100% said work</b> </div> <div class="clear"></div> </li> <li id="l_zsgn82c4b96d" class="online"> <div class="link"> <a href="javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com');%20" onclick="visited('zsgn82c4b96d');" style

In the above content i want to extract from javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')

the string "f6a1ok3n4d4p" and "site2.com" then make it as

http://site2.com/f6a1ok3n4d4p

and same for javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com') to become

http://site1.com/zsgn82c4b96d

I need it to be done with ruby regex

  • This is a Ruby language question, not Unix/Linux – X Tian Jun 30 '15 at 02:15
  • possible duplicate of [How can I extract URLs from HTML content with a Ruby regexp?](http://stackoverflow.com/questions/31128923/how-can-i-extract-urls-from-html-content-with-a-ruby-regexp) – Simon Jun 30 '15 at 13:55

1 Answers1

1

This should give you some insight of how to do it. https://regex101.com/r/wD4oT8/2

javascript:show\(\'(.*?)'.*?\'([^\']*)\'\) will capture the first argument as $1, last part within ' as $2, so you get what you want by substituting as $2/$1.

That's the regex part of it, and, of course, you can adjust the regex as you see fit, for example, to include the usage of " (javascript:show\((?:\'|\")(.*?)(?:\'|\").*?\'([^\'\"]*)(?:\'|\")\) or allow only with 3 arguments.

/yourregex/.match(yourstring) will extract the information you need.

Andris Leduskrasts
  • 1,210
  • 7
  • 16