53

I have rather simple HttpClient 4 code that calls HttpGet to get HTML output. The HTML returns with scripts and image locations all set to local (e.g. <img src="/images/foo.jpg"/>) so I need calling URL to make these into absolute (<img src="http://foo.com/images/foo.jpg"/>) Now comes the problem - during the call there may be one or two 302 redirects so the original URL is no longer reflects the location of HTML.

How do I get the latest URL of the returned content given all the redirects I may (or may not) have?

I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - couldn't find anything.

Edited: HttpGet#getURI() returns original calling address

lfurini
  • 3,729
  • 4
  • 30
  • 48
Bostone
  • 36,858
  • 39
  • 167
  • 227

8 Answers8

63

That would be the current URL, which you can get by calling

  HttpGet#getURI();

EDIT: You didn't mention how you are doing redirect. That works for us because we handle the 302 ourselves.

Sounds like you are using DefaultRedirectHandler. We used to do that. It's kind of tricky to get the current URL. You need to use your own context. Here are the relevant code snippets,

        HttpGet httpget = new HttpGet(url);
        HttpContext context = new BasicHttpContext(); 
        HttpResponse response = httpClient.execute(httpget, context); 
        if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
            throw new IOException(response.getStatusLine().toString());
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute( 
                ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute( 
                ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());

The default redirect didn't work for us so we changed but I forgot what was the problem.

Mohsen
  • 3,512
  • 3
  • 38
  • 66
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
  • 1
    Alas, it will not - getURI() returns me the original calling URL – Bostone Sep 21 '09 at 22:17
  • 1
    I don't do anything special - very basic HttpGet code. I google my problem I think I need to disable auto-redirect and "follow the trail" until I get 200 – Bostone Sep 22 '09 at 01:19
  • "Follow the trail" is much more flexible but it's not trivial. You have to watch for relative URL, circular redirect etc. If default redirect works for you, my code will get the URL for you. – ZZ Coder Sep 22 '09 at 01:58
  • Indeed - I ended up using currentHost.toURI() since I only need http://host part. Thank you for nailing this for me! – Bostone Sep 23 '09 at 06:18
  • 1
    It seems pretty silly that they made accomplishing this so complicated in HttpClient 4. In v3, there was a `getPath()` method that did the trick. – stevevls Sep 05 '11 at 15:50
  • 6
    ExecutionContext is now deprecated, use HttpCoreContext instead. – Mark McLaren Aug 03 '15 at 13:43
  • 1
    Unfortunately the attributes ExecutionContext.HTTP_TARGET_HOST and ExecutionContext.HTTP_REQUEST are deprecated – Jakob Alexander Eichler Aug 08 '16 at 17:32
44

In HttpClient 4, if you are using LaxRedirectStrategy or any subclass of DefaultRedirectStrategy, this is the recommended way (see source code of DefaultRedirectStrategy) :

HttpContext context = new BasicHttpContext();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
RedirectLocations locations = (RedirectLocations) context.getAttribute(DefaultRedirectStrategy.REDIRECT_LOCATIONS);
if (locations != null) {
    finalUrl = locations.getAll().get(locations.getAll().size() - 1);
}

Since HttpClient 4.3.x, the above code can be simplified as:

HttpClientContext context = HttpClientContext.create();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
List<URI> locations = context.getRedirectLocations();
if (locations != null) {
    finalUrl = locations.get(locations.size() - 1);
}
Haozhun
  • 6,331
  • 3
  • 29
  • 50
david_p
  • 5,722
  • 1
  • 32
  • 26
  • 3
    Your answer should've received the checkmark. This is how Apache actually intended this! Great job! – Martijn Sep 11 '14 at 18:53
  • 1
    Plain and simple. And this solution works better than all others mentioned here! – korpe May 08 '15 at 12:33
  • The response I am getting has the status code of 204 which means no content. However there is a location header in the response. But Apache HttpClient is not getting the location header in this case. I think because of the 204 response. Is there a way around this? – Arya Oct 09 '16 at 15:30
  • 1
    Thanks a lot for this! In the latest version `DefaultRedirectStrategy.REDIRECT_LOCATIONS` is deprecated, `HttpClientContext.REDIRECT_LOCATIONS` can/should be used instead. – dav1d Dec 05 '16 at 09:46
  • Is there anyway to get redirect status for the very first redirect? i.e. 301 or 302? – srchulo Feb 14 '17 at 19:13
  • 1
    If you're doing a POST query you SHOULD set the redirect Strategy to `LaxRedirectStrategy` or `getRedirectLocations`will return null. – Hugodby May 29 '17 at 16:25
  • Much appreciated, @Hugodby! I'm trying to do this with a POST request and had no clue why it didn't work until I saw your comment. – birgersp Aug 26 '19 at 06:26
15
    HttpGet httpGet = new HttpHead("<put your URL here>");
    HttpClient httpClient = HttpClients.createDefault();
    HttpClientContext context = HttpClientContext.create();
    httpClient.execute(httpGet, context);
    List<URI> redirectURIs = context.getRedirectLocations();
    if (redirectURIs != null && !redirectURIs.isEmpty()) {
        for (URI redirectURI : redirectURIs) {
            System.out.println("Redirect URI: " + redirectURI);
        }
        URI finalURI = redirectURIs.get(redirectURIs.size() - 1);
    }
Atharva
  • 6,711
  • 5
  • 31
  • 39
  • 1
    Something else to be aware of (with all these answers) is the concept of "[Atomic HTTP redirect handling](https://fetch.spec.whatwg.org/#atomic-http-redirect-handling)", which suggests that clients (at least of some types - web apps) shouldn't be able to see any except the last of the redirection URLs, for security purposes. (However, in Java it might be hard to completely prevent it). – Martin Pain Oct 02 '15 at 12:19
8

I found this on HttpComponents Client Documentation

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext context = HttpClientContext.create();
HttpGet httpget = new HttpGet("http://localhost:8080/");
CloseableHttpResponse response = httpclient.execute(httpget, context);
try {
    HttpHost target = context.getTargetHost();
    List<URI> redirectLocations = context.getRedirectLocations();
    URI location = URIUtils.resolve(httpget.getURI(), target, redirectLocations);
    System.out.println("Final HTTP location: " + location.toASCIIString());
    // Expected to be an absolute URI
} finally {
    response.close();
}
AmirHossein
  • 1,310
  • 1
  • 12
  • 19
6

An IMHO improved way based upon ZZ Coder's solution is to use a ResponseInterceptor to simply track the last redirect location. That way you don't lose information e.g. after an hashtag. Without the response interceptor you lose the hashtag. Example: http://j.mp/OxbI23

private static HttpClient createHttpClient() throws NoSuchAlgorithmException, KeyManagementException {
    SSLContext sslContext = SSLContext.getInstance("SSL");
    TrustManager[] trustAllCerts = new TrustManager[] { new TrustAllTrustManager() };
    sslContext.init(null, trustAllCerts, new java.security.SecureRandom());

    SSLSocketFactory sslSocketFactory = new SSLSocketFactory(sslContext);
    SchemeRegistry schemeRegistry = new SchemeRegistry();
    schemeRegistry.register(new Scheme("https", 443, sslSocketFactory));
    schemeRegistry.register(new Scheme("http", 80, new PlainSocketFactory()));

    HttpParams params = new BasicHttpParams();
    ClientConnectionManager cm = new org.apache.http.impl.conn.SingleClientConnManager(schemeRegistry);

    // some pages require a user agent
    AbstractHttpClient httpClient = new DefaultHttpClient(cm, params);
    HttpProtocolParams.setUserAgent(httpClient.getParams(), "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1");

    httpClient.setRedirectStrategy(new RedirectStrategy());

    httpClient.addResponseInterceptor(new HttpResponseInterceptor() {
        @Override
        public void process(HttpResponse response, HttpContext context)
                throws HttpException, IOException {
            if (response.containsHeader("Location")) {
                Header[] locations = response.getHeaders("Location");
                if (locations.length > 0)
                    context.setAttribute(LAST_REDIRECT_URL, locations[0].getValue());
            }
        }
    });

    return httpClient;
}

private String getUrlAfterRedirects(HttpContext context) {
    String lastRedirectUrl = (String) context.getAttribute(LAST_REDIRECT_URL);
    if (lastRedirectUrl != null)
        return lastRedirectUrl;
    else {
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute(ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());
        return currentUrl;
    }
}

public static final String LAST_REDIRECT_URL = "last_redirect_url";

use it just like ZZ Coder's solution:

HttpResponse response = httpClient.execute(httpGet, context);
String url = getUrlAfterRedirects(context);
Michael Pollmeier
  • 1,370
  • 11
  • 20
4

I think easier way to find last URL is to use DefaultRedirectHandler.

package ru.test.test;

import java.net.URI;

import org.apache.http.HttpResponse;
import org.apache.http.ProtocolException;
import org.apache.http.impl.client.DefaultRedirectHandler;
import org.apache.http.protocol.HttpContext;

public class MyRedirectHandler extends DefaultRedirectHandler {

    public URI lastRedirectedUri;

    @Override
    public boolean isRedirectRequested(HttpResponse response, HttpContext context) {

        return super.isRedirectRequested(response, context);
    }

    @Override
    public URI getLocationURI(HttpResponse response, HttpContext context)
            throws ProtocolException {

        lastRedirectedUri = super.getLocationURI(response, context);

        return lastRedirectedUri;
    }

}

Code to use this handler:

  DefaultHttpClient httpclient = new DefaultHttpClient();
  MyRedirectHandler handler = new MyRedirectHandler();
  httpclient.setRedirectHandler(handler);

  HttpGet get = new HttpGet(url);

  HttpResponse response = httpclient.execute(get);

  HttpEntity entity = response.getEntity();
  lastUrl = url;
  if(handler.lastRedirectedUri != null){
      lastUrl = handler.lastRedirectedUri.toString();
  }
ydanila
  • 435
  • 1
  • 5
  • 11
2

In version 2.3 Android still do not support following redirect (HTTP code 302). I just read location header and download again:

if (statusCode != HttpStatus.SC_OK) {
    Header[] headers = response.getHeaders("Location");

    if (headers != null && headers.length != 0) {
        String newUrl = headers[headers.length - 1].getValue();
        // call again the same downloading method with new URL
        return downloadBitmap(newUrl);
    } else {
        return null;
    }
}

No circular redirects protection here so be careful. More on by blog Follow 302 redirects with AndroidHttpClient

Nikola
  • 144
  • 1
  • 3
0

This is how I managed to get the redirect URL:

Header[] arr = httpResponse.getHeaders("Location");
for (Header head : arr){
    String whatever = arr.getValue();
}

Or, if you are sure that there is only one redirect location, do this:

httpResponse.getFirstHeader("Location").getValue();
Salman
  • 1