0

I am getting http status codes for urls using jsoup as follows:

Connection.Response response = null
Document doc = Jsoup.connect(url).ignoreContentType(true).get()
                    response = Jsoup.connect(url)
                            .userAgent("Mozilla/5.0 (X11  Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
                            .timeout(10000)
                            .execute()
                    int statusCode = response.statusCode()
                    if (statusCode == 200)
                        urlExists = true
                    else
                        urlExists = false

Basically, I want to check if the url specified is returning 200 status code or not i.e. if its a html page, does it exist or if its a pdf file, does it exist and so on. It does not work for urls ending in .jpg because jpg files cannot be parsed by jsoup. I am using jsoup in conjunction with crawler4j. Is there any other way i can find the http status code for all the urls. My urls end in following extensions:

css js pdf zip rar tar png gif html

clever_bassi
  • 2,392
  • 2
  • 24
  • 43

1 Answers1

0

Can't you just use

int responseCode = new URL(url).openConnection().responseCode
tim_yates
  • 167,322
  • 27
  • 342
  • 338
  • I tried that but it showed me status code 404 for files that exist too. For eg: http://www.icidigital.com/wp-content/themes/i-cubed-eu/assets-ici/images/clients/t rowe price_logo.png it says 404 though the url exists – clever_bassi Aug 27 '14 at 19:37
  • 2
    That link gives me a 404 when I click on it – tim_yates Aug 27 '14 at 19:40
  • Because the link that i posted in comment somehow got trimmed. its not what i posted. – clever_bassi Aug 28 '14 at 19:49
  • I am accepting your answer because even when I had researched, i got this only. somehow it doesnt work for me but this should be correct. Thanks. – clever_bassi Aug 29 '14 at 04:35
  • Shouldn't it be int responseCode = new URL(url).openConnection().getResponseCode() – clever_bassi Sep 03 '14 at 05:09