0

before someone tells me that there's already this question here, i must say i've tried basically every single example i've found.

The url i'm trying to download has a type of 'audio/wav', embedded in a video tag, or at least this is what i see when running Chrome's element inspector.

The matter is, the URL (which i can't post here) does not point to a .wav file nor anything, but to an ASP page, which seems to generate the audio.

So far so good, the problem here is that i can't really download the audio.

Basically my webclient is created like:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38); // Also tried Chrome here.
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = (HtmlPage)webClient.getPage(URL);

I've tried creating an anchor element that links to the page containing the audio file:

HtmlElement createdElement = (HtmlElement) page.createElement("a");
createdElement.setAttribute("id", "link_som");
createdElement.setAttribute("href", "../sound.asp?app=audio");
page.appendChild(createdElement);

HtmlAnchor anc =(HtmlAnchor) page.getElementById("link_som", true); //tried this just to make sure it was returning the right anchor

InputStream inputStream = anc.click().getWebResponse().getContentAsStream();
//Writing the inputStream to a file generates a file which has 0 KB.

Also tried running the javascript that links to new URL through HtmlUnit:

ScriptResult resultado = page.executeJavaScript("window.open('../sound.asp?app=audio');");
webClient.waitForBackgroundJavaScript(5000);
HtmlPage paginaRes = (HtmlPage)resultado.getNewPage();

InputStream inputStream =paginaRes.getWebResponse().getContentAsStream(); //Here the inputStream also generates a 0 KB file

Interesting though, is that in all those cases i tried, if i write the inputStream to the console, it returns the main page source, for example:

int binary = 0;
while ((binary = inputStream.read()) != -1)
{
   System.out.print((char)binary); //prints the old page source, and in some other tests, prints nothing.
}

Ps.: When opening the URL on chrome manually, it has an embedded player, on FireFox, it asks for Quicktime.

Bruno Brs
  • 673
  • 1
  • 6
  • 23
  • In FireFox the audio is in the tag , while in Chrome is seems to use its own Shadow Dom to play the file. – Bruno Brs Jul 29 '15 at 16:36
  • I figured out the main page generates a cookie, which is then used to generate the audio, however in HTML unit the cookie used to generate the audio does not exist... or at least does not appear on the list webClient.getCookieManager().getCookies(); Is there any reason? – Bruno Brs Jul 29 '15 at 18:17

2 Answers2

0

I am able using htmlunit to get audio element FYI, my version is 2.15

Feng
  • 4,933
  • 2
  • 14
  • 9
-1

I have solved this a long time already, then just to let others know. The solution was giving up HTMLUnit and using Selenium with phamtomJS.

Bruno Brs
  • 673
  • 1
  • 6
  • 23