3

I have OpenRefine (a webapp hosted by jetty) running on:

http://127.0.0.1:3333

Which looks like this:

Web OK

Everything works perfectly.

Now I would like to tunnel this through Apache2 (for security and renaming reasons), so I changed my http.conf file and modified it like this:

ProxyPass /refine http://127.0.0.1:3333
ProxyPassReverse /refine http://127.0.0.1:3333

Now if I try to open the page through the proxy this is what I see:

Web Bad

It looks like all the dynamic content is not working properly. How can I solve this?

Notes:

  • I made sure mod_proxy is updated and working. Tested with other webapps from Tomcat.
Jerry
  • 107
  • 6
Jesus
  • 655
  • 1
  • 7
  • 21

2 Answers2

3

You can use mod_proxy with OpenRefine without using virtual hosts.

I needed to do this exact same thing today. I have an SSL portal through which users must authenticate with some complicated PKI and LDAP tracking, and I need OpenRefine to be hosted behind this because of some data which it has access to. The answers to this problem given in this thread and elsewhere simply weren't acceptable, so I went through the source code expected to patch this behavior in--but I didn't have to!

I noticed that because OpenRefine runs out of a WEB-INF directory, it probably is probably built as a typical java web app. And sure enough, when I looked for how the context was being set on the server, I found this in Refine.java:

final String contextPath = Configurations.get("refine.context_path","/");

So this is what you do:

NOTE: StackOverflow won't let me write things that look like URLs because I don't have any reputation here. So when you read http:\, that really means http://.

1) In refine.ini, make sure that JAVA_OPTIONS includes "-Drefine.context_path=/refine". (It should go without saying that you changed refine.host to 0.0.0.0 instead of 127.0.0.1 and you also set refine.headless=true.) When you restart OpenRefine now, you'll access it at http:\your.refine.server:3333/refine (obviously put your server hostname in that url).

2) Now, for a simple example, we will make https:\your.apache.server/refine proxy to http:\your.refine.server:3333/refine .

In one of your httpd config files (maybe make a openrefine.conf in /etc/httpd/conf.d) put the following lines after enabling mod_proxy:

ProxyPass /refine http:\\your.refine.server:3333/refine
ProxyPassReverse /refine http:\\your.refine.server:3333/refine

The difference here is that OpenRefine is kept out of the global context, so the root of the application can be proxied. OpenRefine makes requests for resources with an absolute path, based upon how the context is set. So if you don't do this, OpenRefine will make javascript files which fall outside your proxy location, as everyone else on this thread was experiencing previously,

In real life, you might want to have mod_proxy use a load balancer over multiple OpenRefine instances and you might want to put some logic about which users are allowed to use this proxy tunnel.

Hope this helps someone else!

I also recommend that you review the undocumented Refine server properties which are also in Refine.java.

Brian Blackburn
  • 111
  • 1
  • 3
0

You changed the app location from http://your.server:3333/ to http://your.server/refine

You can see that a link with, for example, href="/resource.css" would no longer be valid since that resource has now moved to "/refine/resource.css". I think if you go digging through the HTML source you will find dozens of these links with absolute paths.

This configuration will break any absolute path references. The complicated way to resolve this issue is called URL rewriting, and there are in-depth tutorials for how to set up Mod-Proxy and Reverse with URL rewriting. It is complicated to explain, and easy to do wrong; instead add a VirtualHost so that absolute path links don't need rewriting.

<VirtualHost *>
    ServerName refine

    ProxyRequests Off
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>

    ProxyPass / http://127.0.0.1:3333/
    ProxyPassReverse / http://127.0.0.1:3333/
    <Location />
        Order allow,deny
        Allow from all
    </Location>
</VirtualHost>

It's unlikely that you're going to find any absolute links to localhost:3333 so this will probably work for you. Change your /etc/hosts so that refine resolves to 127.0.0.1 and you'll be golden. You can now use refine with no problem from http://refine/.

127.0.0.1    localhost refine

If you're trying to enable access from outside hosts, a slightly more complicated setup will involve a new DNS record and should be easy to imagine from here.

Kingdon
  • 308
  • 4
  • 12
  • This solves the problem in local, but I want to access from outside host. I really don't see DNS record as a solution, my machine is already running a couple of services in port 80 and 8089 and that would be messy. But now I know what is failing (absolute paths), and how to deal with it(mod rewrite). Thanks – Jesus Sep 25 '14 at 08:52
  • The point of VirtualHost is that you can run multiple services on Port 80, letting Apache handle routing based on the request details. Each service gets its own hostname, and the server decides which service to ProxyPass your request to based on which hostname you give it. If you want your service to be securely available from remote, use https and add authentication to the VirtualHost. I have dozens of services listening on various ports on various hosts behind a firewall, and one Apache server sits in front of them all managing access for outside visitors based on VirtualHost. – Kingdon Sep 25 '14 at 12:48
  • 1
    Set up a default virtual host as well. Check out /etc/apache2/sites-available and sites-enabled if you are on a Debian or Ubuntu system – Kingdon Sep 25 '14 at 12:56