0

i have an string variable with links inside (among other text), and i want to be able to extract all links containing a certain patron (like containing the word 'case')... is this possible to do?

Variable string is something like:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';

As a workaround, i used what described here: extract links from document, to create a document with the string as content and then extract the links, but i would like to do it directly...

Regards,

EDIT (To Ruben):

If i use:

var string = 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more ';

I got only the first link twice (see screenshot here).

And if i use:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html ';

The same again (see screenshoot here).

Community
  • 1
  • 1
kurokirasama
  • 737
  • 8
  • 31
  • What do you mean by "an string variable with links inside"? Are they URL? Including a sample string could clarify what you mean. What do you tried? – Rubén Nov 21 '16 at 17:47
  • ok. variable string is something like: var string = 'here is some text line among the ones there will be links like http://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more'; – kurokirasama Nov 21 '16 at 18:36

2 Answers2

1

Google Apps Script

function test2(){
  var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
  var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
  for(var i = 0; i <= re.exec(string).length; i++){
    if(re.exec(string)[i]) Logger.log(re.exec(string)[i]) 
  }
}

JavaScript.

var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
for(var i = 0; i <= re.exec(string).length; i++){
  if(re.exec(string)[i]) console.log(re.exec(string)[i])
} 

Reference

RegularExpression to Extract Url For Javascript

Community
  • 1
  • 1
Rubén
  • 34,714
  • 9
  • 70
  • 166
  • Ok, I used your updated version on this string: 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html http://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script/40728530?noredirect=1#comment68918084_40728530 here is some text line among the ones there will be links like http://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html'; – kurokirasama Nov 30 '16 at 17:56
  • I have the same problem with only getting the first link. Any progress during the last 4 years, @Rubén? :-) – Björn Larsson Jun 18 '21 at 15:13
  • @BjörnLarsson The code in this answers is working correctly. Please post a new question including a [mcve]. – Rubén Jun 18 '21 at 15:49
1

If you're only getting the first match then I think you need the 'g' flag on the Regular Expression to capture all matches, then each call to exec() will return the next match. I'm using:

const re = /(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm;

while ((reResults = re.exec(s)) !== null) { //finds next match
      Logger.log(reResults[0]); //result of next match
}
GeekYouUp
  • 1,651
  • 11
  • 10