1

Is it possible to detect and remove any kind of URLs in a sentence?

For example:

Today,wheather is cold.But I want to out. http://weathers.com..... And I will take a cup of tea...

should become

Today,wheather is cold.But I want to out. And I will take a cup of tea...
arshajii
  • 127,459
  • 24
  • 238
  • 287
reigeki
  • 391
  • 1
  • 5
  • 19
  • Use a regex. Answer here: http://stackoverflow.com/questions/833469/regular-expression-for-url#answer-8234912 – BackSlash Jul 01 '13 at 15:20
  • define **any kind of urls** pls. `https://? file:///? ftp://? scp://? smb://.. ... ?` – Kent Jul 01 '13 at 15:21
  • https://? file:///? ftp://? scp://? smb://,...and also shorted urls that usually use on twitter – reigeki Jul 01 '13 at 15:29

2 Answers2

3

It depends on how comprehensive you want the matching process to be. You can try using something as simple as

str.replaceAll("http://[^\\s]+", "")

e.g.

System.out.println("Today,wheather is cold.But I want to out. "
        + "http://weathers.com..... And I will take a cup of tea..."
        .replaceAll("http://[^\\s]+", ""));
Today,wheather is cold.But I want to out.  And I will take a cup of tea...

If you want something more robust to match valid URLs, use a fuller URL regular expression:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

For even more thorough matching, refer to this answer.

Community
  • 1
  • 1
arshajii
  • 127,459
  • 24
  • 238
  • 287
1

Try out the bellow regular expression

((http|ftp|https):\/\/)?[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

for matching your valid URL and the following code should do, what you want:

    String str = "Today,wheather is cold. But I want to out. http://weathers.com..... And I will take a cup of tea";
    String regularExpression = "(((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&:/~\\+#]*[\\w\\-\\@?^=%&/~\\+#])?)";
    str = str.replaceAll(regularExpression,"");
    System.out.println(str);

Edit:

However this regular expression will not work for all types of URL's, because its too much complicated and hard to find the perfect regular expressions to match all types of URL.

Sazzadur Rahaman
  • 6,938
  • 1
  • 30
  • 52