I've implemented non-greedy regex on a group of string URLs, where I'm trying to clean them up so that they end after the .com (.co.uk etc). Some of them continued with '
or "
or <
after the desired cutoff, and so I used x = re.findall('([A-Za-z0-9]+@\S+.co\S*?)[\'"<]', finalSoup2)
.
The problem is that some URLs are misc@misc.misc'misc''misc' (or similar with < >) and so after implementing the non-greedy regex I'm still left with enquiries@smart-traffic.com.au">enquiries@smart-traffic.com.au
, for example.
I've tried two ??
's together, but obviously not working, so what's they proper way to acheive clean URLs in this situation?