3

I am working on a program that interacts with users on a social network. I would like a tool that handles most or all of the web communication

My work involves logging into the site, reading posts, sending replies and personal messages.

I initially thought I could handle the necessary interactions with some socket code. My initial single-run cases run successfully (I'm able to log in, post content, respond to inquiries, etc), however trying to run everything together does not work (server does not recognize my requests, various other errors). I do some very poor cookie tracking, I switch sockets from 80 to 443 for SSL communication, I generate my own packets to send to the social network servers and since this is my first attempt at web-interaction, I am pretty far out of my depth. I would prefer an integrated solution that tracks cookies, handles SSL communications, handles general communication problems etc. Essentially I wish I could give an X,Y coordinate of a button (or get the list of buttons from the page, find the one I want, select that one) and text of what to type in a text box and have the mock-browser execute all the necessary packets to make the web-interaction happen.

I would like to know if there is a java mock-browser I can use. Ie, one that I can get a handle for the text boxes on a page, enter in my log in info and execute the log in procedure (the mock-browser would then handle all the cookies, handle the individual packets, etc)

My goal is to have a program I can run on my computer that can interact with users on a social network without requiring any significant input on my part (I don't want the program monopolizing Firefox which prevents me from using Firefox while on my system)

For context, I am sending typical html page calls and also custom packets calls (and parsing the results)

Thank you for your assistance

  • 2
    have you looked into selenium yet? – mkoryak Jul 05 '12 at 21:35
  • 2
    or HtmlUnit (http://htmlunit.sourceforge.net)? – JB Nizet Jul 05 '12 at 21:36
  • At a glance Selenium looks like it requires Fiewfox to run, if I'm not mistaken. HtmlUnit looks pretty close to what I'm looking for, do you by chance know if it can handle Difi requests? Something that runs as a standalone is desirable to me (doesn't require other software, FF, etc) – Parallel Logic Jul 05 '12 at 21:52
  • Please provide a link to Difi. – opyate Jul 05 '12 at 22:00
  • Difi was incorrect, I meant to say the social network has a custom protocol they use in certain instances - ie, to get more content in the message center, a custom packet is sent requesting just the pertinent content rather than requesting the entire new page. The reply isn't a typical html page, so I need the capability of sending custom packets to the network and then personally parsing the response in some cases (but I'd still prefer the mock-browser handle the cookies) - or I need the mock-browser to understand Javascript moderately well (I think FF uses JS to parse packets iirc) – Parallel Logic Jul 05 '12 at 22:10
  • Be aware that automated access to a social network is most likely violating its Terms of Service. Using an official API will usually be less trouble, both from a legal and technical perspective. – Philipp Reichart Jul 05 '12 at 23:10

3 Answers3

1

I would recommend that you go one of two routes with this:

Option 1: Use the Apache Http Components Library. I found this very easy to use for sending form data to a web server. It supports SSL and cookies, although I haven't used it for that. The only issue I have with it is I cannot seem to get it to communicate through a proxy server. See my the question I posted about this. But as long as you are not going through a proxy, I give the library a glowing recommendation and the code I posted in the above link shows how easy it is to use. Here's an example of code that sends form data to a web server:

import org.apache.http.*;
import org.apache.http.client.*;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.client.methods.*;  //HttpHead, HttpPut, HttpGet, etc...
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.conn.params.ConnRoutePNames;
import org.apache.http.entity.*;
import org.apache.http.impl.client.SystemDefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

public static void sendForm(String user, String val) throws IOException {    
    List<NameValuePair> formparams = new ArrayList<NameValuePair>();
    formparams.add(new BasicNameValuePair("user", user));
    formparams.add(new BasicNameValuePair("message", val));
    UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formparams, "UTF-8");
    String uri = "http://theServer.com";
    HttpPost httppost = new HttpPost(uri);  
    HttpClient httpclient = new SystemDefaultHttpClient();
    HttpResponse response = httpclient.execute(httppost);
    response.getStatusLine().toString() + "\n" +
    EntityUtils.toString(response.getEntity()));
    System.out.println(EntityUtils.toString(response.getEntity()));       
}

Option 2: There is a fully functional web browser that comes with JavaFX called web view. You can interact with it programmatically and this was discussed in a recent installment of the Java Spotlight Podcast.

Community
  • 1
  • 1
Thorn
  • 4,015
  • 4
  • 23
  • 42
  • Thank you I'm working on a client-side application, so I'm not currently concerned with proxies or sending forms. I've dabbled in the HTTPclient code somewhat, but found a lot of the examples referred to code that has been moved inside the library in more recent versions and no longer works, and considering I'd still be tracking cookies, I'm not too sure how far it would get me. However, JavaFX sounds interesting. I know the buttons I want to press. I see that JavaFX is purportedly bundled with Java 7.0_05 - but after upgrading the imports like "import javafx.application.*;" are not found – Parallel Logic Jul 06 '12 at 06:36
  • I don't suppose you have worked with JavaFX have you/have any tips on working with the classes? – Parallel Logic Jul 06 '12 at 06:39
  • I have limited experience with it, but netbeans 7.1 supports it (you can create a javafx 2 project) and 7.2 rc1 integrates with scene builder, a gui design tool. – Thorn Jul 06 '12 at 07:30
0

You could embed env.rhino.js in your Java app.

env.js is "a highly portable javascript implementation of the Browser as a scripting environment (often referred to as a 'headless' browser)."

The rhino implementation uses the rhino javascript engine which is a Java runtime for JavaScript and will ship with the Oracle Java implementation.

env.js is reasonably capable in that it uses a cross-compiled version of a reference html5 parser and can process JavaScript which makes full use of the jQuery library and the html dom.


Additionally, I do like Thorn's suggestion of the JavaFX web component. Though, if you don't need to display any visuals, you may only need the WebEngine and not WebView.

jewelsea
  • 150,031
  • 14
  • 366
  • 406
0

The industry standard is Selenium. It's usually used to create automated system tests, but it could be used wherever you need a in-code browser.

I will caution you that it can be a steep learning curve to get it working... considerable arcane glue code is required, but once you get it up and running it's pretty good.

Bohemian
  • 412,405
  • 93
  • 575
  • 722