If you are looking to remove URLs from your string, you may use:
gsub("(f|ht)tp(s?)://(.*)[.][a-z]+", "", x)
Where x
would be:
x <- c("some text http://idontwantthis.com",
"same problem again http://pleaseremoveme.com")
It would be easier to provide you with a specific answer if you could post sample of your data but the following example would give you a clean text with no URLs:
> clean_x <- gsub("(f|ht)tp(s?)://(.*)[.][a-z]+", "", x)
> clean_x
[1] "some text " "same problem again "
As a side point, I would suggest that it may be worth searching for the existing methods to clean text before mining. For example the clean
function discussed here would enable you to do this automatically. On similar lines, there are function to clean your text from tweets (#
,@
), punctuation and other undesirable entries.