0

i have the following lines of java code:

        d = Jsoup
                .connect(getUrl)
                .timeout(5000)
                .userAgent(
                        "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0")
                .followRedirects(true)
                .get();

inside a thread class which implements Runnable and is started by a Thread Executor. my problem is, that Jsoup keeps firing exceptions because of :

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=502, URL=http://sub.domain.de:8080/ws/Codes/Texts-Listen;Stud-Sets;name;AAFF-B?template=UNEinzelGru&weeks=39&days=&periods=3-64&Width=0&Height=0

    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:449)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:424)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:178)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:167)
    at com.noncomercial.parsingthread.Threads.VParserThread.run(VThread.java:62)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

If i try to reach the URL in my browser, everything works fine. also if i try a sample jsoup connect like :

Document d = Jsoup.connect("http://sub.domain.de:8080/ws/Codes/Texts-Listen;Stud-Sets;name;AAFF-B?template=UNEinzelGru&weeks=39&days=&periods=3-64&Width=0&Height=0").get();

it also doesn't fire an exception. i really have no idea whats wrong with my threads or with my connections.

any ideas :-/ ?

//EDIT: okay, this is interesting: as i started to debug my code with a breakpoint on this connect, it magically didn't happen anymore.

so i thought about a problem of multiple connections at a same time issue on server side....

I set my ExecutorService to : executor = Executors.newFixedThreadPool(1); so just one thread at a time is used. its running since 10 minutes and no error occurs ...

okay i think there are more than 500 pages to parse, a thread based solution would be great, any ideas how to get rid of these errors ?

Smoki
  • 551
  • 2
  • 9
  • 28

2 Answers2

0

As a suggestion:

Maybe you have an encoding problem with/in your URL.
Try to use following:

URLEncoder.encode(queryString, "UTF-8") 

Note: Maybe you just need to encode the Query-String, not the entire URL!
Not tested

see here for simmilar problem: https://stackoverflow.com/a/12476622/3887073

EDIT:
Maybe this is a server problem.
see HTTP Error 502 Service Temporarily Overloaded

Community
  • 1
  • 1
Ben
  • 3,378
  • 30
  • 46
  • hey ben, thanks for your idea. i tried so but the URLs are well formed. did you see my edit? i followed my way down the rabbit hole a bit and i think its maybe a problem of the tomcat on the other side. if i reduce my worker threads to 7 or 8 its working fine. just a number above 8 threads will break everything. i still dont know if its a small problem on my side, or the sever , but now it works so far but i will research a little more – Smoki Sep 08 '14 at 13:30
  • 1
    @Smoki - I agree with Ben's update. IMO, the most likely explanation is that you were *overloading* the server by trying to hit it with too many requests at the same time. Solution: DON'T. Restrict yourself to one page at a time, and be patient. – Stephen C Sep 08 '14 at 13:43
0

As the edit Ben says, it looks like a 502 Service Temporarily Overloaded error. Sadly Jsoup dont give any more information about the error so i could have gotten that message earlier, but i'VE got it.

if i don't set maxnumber of threads above 8, it works fine. thanks a lot for input guys :)

Smoki
  • 551
  • 2
  • 9
  • 28