Little background, I'm trying to given an option for customer to add HTML directly and publish a single page website(like blogspot). This brought scammers problem, so I created a microservice that blocks publishing website based on HTML content.
Initially I used JSoup for getting HTML from website, now the scammer has mutated and is using an external website for loading script and it is loaded in async
<script src="https://yolologroyopuedo.us/?api=1&lan=fbcacaroto" type="text/javascript" async="true"></script>
So my initial rendered HTML does not have any scam content so it evades the website blocking. I'm trying to scrape website content after the script has loaded completely or after some fixed time.
I tried but I'm always getting pre hacking script loaded HTML.
Document doc = Jsoup.connect("http://example.com")
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
and tried htmlunit
WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
HtmlPage page = webClient.getPage("http://example.com");
is there an elegant way to scrape a website after all scripts are loaded in Java?