I am creating an email scraper. But when I tried with one particular URL matcher.find()
is not giving any boolean
result. As I see it freezes. But for some other URLs the code is working fine.
Here is my code,
private Matcher matcher;
private Pattern pattern = null;
private final String emailPattern = "([\\w\\-]([\\.\\w])+[\\w]+@([\\w\\-]+\\.)+[A-Za-z]{2,4})";
public void scrape() {
pattern = Pattern.compile(emailPattern);
Document documentTwo = null;
try {
documentTwo = Jsoup.connect("https://www.mercurynews.com/2020/03/21/how-can-i-get-tested-for-covid-19-in-the-bay-area/")
.ignoreHttpErrors(true)
.userAgent(RandomUserAgent.getRandomUserAgent())
.header("Content-Language", "en-US")
.get();
} catch (IOException ex) {
break;
}
String pageBody = documentTwo.toString();
matcher = pattern.matcher(pageBody);
while (matcher.find()) {
// this will never execute for the above web address
}
}
To check I have added System.out.println(matcher.find());
above the while loop and it stucks there without printing any value. So what I am doing wrong here? I have tried with many different email regex patterns but the above pattern is the working one.