-1

I have a string in java that might or might not contain some link (url like www.google.com, stackoverflow.com, stanford.edu, etc). Now I want to search in the string if it contains any link. I have two problems here:

  1. What to search while searching for links. I mean link may or may not contain www, https, com, etc, so how to differentiate it from text. What is the RFC specifications for links?

  2. Which function to use in Java for searching that regex? I am fairly new to Java.

John Powell
  • 12,253
  • 6
  • 59
  • 67
cooljohny
  • 656
  • 5
  • 13
  • 31
  • 1
    post some examples of what you want to match and what you won't. – Avinash Raj Jul 05 '14 at 07:11
  • [RFC 1738](http://www.ietf.org/rfc/rfc1738.txt) defines the syntax of a generic URL. [RFC 2616](http://www.ietf.org/rfc/rfc2616.txt), section 3.2, defines the HTTP URL scheme. – Frxstrem Jul 05 '14 at 07:14
  • examples can be any url...If there is some link in text I need to find it.. – cooljohny Jul 05 '14 at 07:15
  • Is it homework? Because the first Google hit gave a nice example including Java code. – rve Jul 05 '14 at 08:38
  • Or here are some nice links to regexes you could use: https://stackoverflow.com/questions/5461702 – rve Jul 05 '14 at 08:41
  • Thanks rve!!no not a homework!!I wanted some more informative answers related to rfc and using some expression to detect instead of simply divide the string and check!! – cooljohny Jul 05 '14 at 08:43

2 Answers2

2

This will help you.

  • Split the whole string by delimiting it by spaces.
  • Try to form URL using the each item.

    import java.net.URL;
    import java.net.MalformedURLException;
    
    // Replaces URLs with html hrefs codes
       public class URLInString {
         public static void main(String[] args) {
         String s = args[0];
         // separete input by spaces ( URLs don't have spaces )
         String [] parts = s.split("\\s");
         // Attempt to convert each item into an URL.   
         for( String item : parts ){ 
            try {
               URL url = new URL(item);
               // If possible then replace with anchor...
               System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );    
               }catch (MalformedURLException e) {
                   // If there was an URL that was not it!...
                   System.out.print( item + " " );
               }
          }            
      }
    
John Powell
  • 12,253
  • 6
  • 59
  • 67
Rahul Sharma
  • 347
  • 1
  • 16
  • any other solution..this is a long procedure with for loop if I have to do it for thousands of string this won't be appropriate..What I was looking for is some regex expressionfor url that can be checked in string...although I liked your solution as its simple and tricky ... – cooljohny Jul 05 '14 at 08:26
  • 2
    You were missing a brace after the for, btw. – John Powell Jul 05 '14 at 08:33
  • regex has loops to just not written in your app code. you should try more than one apprcoah and time them if u need performance – tgkprog Jul 05 '14 at 09:33
  • Still, it is bad practice to rely on exceptions (which may be quite slow, since they need to fill in stack-trace information) for flow execution. Loops are unavoidable, but slow loops can be avoided. – tucuxi Jul 05 '14 at 09:47
0

This does not rely on exceptions to find URL validity, just on locating URLs by regex:

/**
 * Fills the arraylist urls with all valid (and a few invalid) urls in 's'
 */
void findUrlsInString(String s, ArrayList<String> urls) {
    Pattern p = Pattern.compile(
        "(([a-z]+):((//)|(\\\\))+)?[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*");
    Matcher m = p.matcher(s);
    while (m.find()) {
        urls.add(m.group());
    }
}

The regex is not perfect; I have adapted it from here, but I could not find a canonical Java regex for URLs. You can craft invalid URLs that will pass this regex, but it will require a slight effort.

tucuxi
  • 17,561
  • 2
  • 43
  • 74