0

There are text that includes http and https. I tried this to get url.. It works well only for http.

url_regex <- "http[^([:blank:]|\\"|<|&|#\n\r)]+"

When I tried like below, it doesn't work.

url_regex <- "(http|https)[^([:blank:]|\\"|<|&|#\n\r)]+"

To get the urls starting with http or https, where should I modify?

p.s. - I tried with regex which works on other language. What does R used the version of regex?

2 Answers2

1

The problem is with the quotation marks that are in the middle of your regex that close the ones at the beginning. Both regex should be defined with single mark at the beginning and end and it would work and allow you to use quotation marks inside regex:

url_regex <- '(http|https)[^([:blank:]|\\"|<|&|#\n\r)]+'
avidalvi
  • 128
  • 1
  • 6
0

Checkout this post. It uses rex package to create a regex that you can modify easily, if you have other extensions to consider. Well documented to.

mupierrix
  • 85
  • 2
  • 2
  • 12