4

I am trying to use Java to submit a captcha to decaptcher.com. Decaptcher doesn't really do a good job of explaining how to use their API's, so I am trying to figure out how to use an HTTP POST request to submit a captcha. Here is the example code I got from their website:

<form 
 method="post" 
 action="http://poster.decaptcher.com/" 
 enctype="multipart/form-data">
 <input type="hidden" name="function"  value="picture2">
 <input type="text"   name="username"  value="client">
 <input type="text"   name="password"  value="qwerty">
 <input type="file"   name="pict">
 <input type="text"   name="pict_to"   value="0">
 <input type="text"   name="pict_type" value="0">
 <input type="submit" value="Send">
</form>

I am supposed to send a post request like that to the web server and get a string returned to me. Here is my attempt to implement that in Java.

public String getDecaptcherAnswer(String username, String password){
        try{
            URL decaptcherPostURL = new URL("http://poster.decaptcher.com/");
            WebRequestSettings request = new WebRequestSettings(decaptcherPostURL, HttpMethod.POST);
            request.setEncodingType(FormEncodingType.MULTIPART);
            ArrayList<NameValuePair> params = new ArrayList<NameValuePair>();
            params.add(new NameValuePair("function", "picture2"));
            params.add(new NameValuePair("username", username));
            params.add(new NameValuePair("password", password));

            //I added this block in 
            File file = new File("captcha.png");
            params.add(new KeyDataPair("pict", capFile, "png", "utf-8"));
            //----------------------

            params.add(new NameValuePair("pict_to", "0"));
            params.add(new NameValuePair("pict_type", "0"));
            request.setRequestParameters(params);
            request.setUrl(decaptcherPostURL);

            HtmlPage page = webClient.getPage(request);
            System.out.println(page.asText());
            System.out.println("--------------------------------------");
            System.out.println(page.asXml());

            return page.asText();
        }catch (Exception e){
            e.printStackTrace();
            return null;
        }
}

Am I supposed to set the value of pict to a File object instead of the String pointing to where the captcha is stored? (captcha.png is the name of the image I am trying to submit).

Dylan
  • 949
  • 3
  • 13
  • 23

3 Answers3

3

There is a higher-level mechanism to send that file, you don't need to create WebRequestSettings and set its individual values.

You should host that static html somewhere and do something like the below.

If you still have an issue, please submit a bug report in HtmlUnit bug tracker.

BTW, HtmlUnit 2.8 is about to be released, give it a try.

WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://some_host/test.html");
HtmlForm form = page.getForms().get(0);
form.getInputByName("username").setValueAttribute(username);
form.getInputByName("password").setValueAttribute(password);
form.getInputByName("pict_to").setValueAttribute("0");
form.getInputByName("pict_type").setValueAttribute("0");
form.getInputByName("pict").setValueAttribute("full_path_to_captcha_png");
form.<HtmlFileInput>getInputByName("pict").setContentType("image/png");//optional
HtmlPage page2 = form.getInputByValue("Send").click();
Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
1

You should not use a NameValuePair for this but its subclass, KeyDataPair. This way you can specify a file to upload.

The following should work:

new KeyDataPair("pict", new File(fileName), "image/png", "utf-8");

The content type parameter is the MIME type of the file. Since you are uploading a PNG file, it should be image/png.

Ronald Wildenberg
  • 31,634
  • 14
  • 90
  • 133
  • Would I declare the KeyValuePair as: – Dylan Aug 03 '10 at 01:57
  • //Pretend I create a File object from "captcha.png" called file new KeyValuePair("pict", file, "png", "utf-8") Are PNG files encoded with UTF-8? – Dylan Aug 03 '10 at 02:00
  • I added an example that I think should work. I'm not sure about the utf-8 charset, maybe you should experiment a little with that. – Ronald Wildenberg Aug 03 '10 at 10:11
  • For the charset, you can use htmlPage.getPageEncoding(); – Ahmed Ashour Aug 03 '10 at 13:03
  • I made the necessary changes but now I am getting a timeout error returned from the site. I think this means that my request is working properly, since I wasn't getting anything back before, but I don't know why the request is timing out. – Dylan Aug 03 '10 at 22:24
  • Also, @Ahmed, doesn't charset refer to the file encoding? – Dylan Aug 03 '10 at 22:26
  • @Dylan, the current implementation of HtmlUnit sends charset as set by the page, if you believe otherwise please submit a bug report with minimal test case in HtmlUnit tracker – Ahmed Ashour Aug 04 '10 at 09:27
  • If you create an html page with a form as described in the documentation (http://decaptcher.com/client/) and in your question, does it work then? If it does, there's still a difference between your code and the expected form data. If it doesn't, there is something wrong with the documentation. – Ronald Wildenberg Aug 04 '10 at 10:42
0

Here's what I was trying to type:

File file = new File("captcha.png");
params.add(new KeyDataPair("pict", capFile, "png", "utf-8"));

Are PNG files encoded with UTF-8? Is that how I would specify the KeyDataPair for the file input? I think I am either specifying the wrong contentType or the wrong charSet, or both. Am I supposed to put them in all caps?

Dylan
  • 949
  • 3
  • 13
  • 23