0

I try to fish all URL from a text and Recognize entities kind of works good with "Entity type" = "URL". but this fails when here are certain special characters in the URL like ' or ç.

With this:

<url><loc>https://www.example.com/what's-this-long-text-willbetruncated</loc></url>
<url><loc>https://www.example.com/françois-is-here-and-not-there</loc>

the results are:

https://www.example.com/what
https://www.example.com/fran

I tried changing around the "language" setting of the Recognize Entities function, didn't help at all.

Do I have to go for find with regex? can be bit of a pain for URLs I learned. Thank you

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

0 Answers0