5

So, I have several jobs on Hudson that take 3-6 hours to run. The slave machines are a variety of different Windows operating systems running on VMs. Sometimes, there will be a minor hiccup that apparently will cause a socket that's been open for 6 hours to become closed. (which doesn't seem crazy even with perfect networking) And so, I end up with a stacktrace pointing to this:

hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: socket closed

Is there any plugin or other way I could fix this extremely annoying problem? When you're 3 hours through a 4 hour build and it fails because of this, it's a bit infuriating.

Earlz
  • 62,085
  • 98
  • 303
  • 499
  • The number of bug reports and mailing list postings I've seen with this exact issue is staggering, and not a single reply with a fix or even hint as to what the problem is. Just "me too".. I'm hoping stackoverflow can give me some kind of answer and serve as a reference to future googlers – Earlz Oct 18 '13 at 15:33

2 Answers2

1

Maybe off topic, but have you considered an alternative CI server, like JetBrains TeamCity? I've used it for 4 years on .NET projects and highly recommend them.

Will Green
  • 932
  • 5
  • 10
  • Primarily can't due to existing infrastructure. Probably 10 jenkins build slaves, about 40 separate builds relying extensively on some jenkins plugins. It'd just be too expensive to do. – Earlz Oct 18 '13 at 17:53
0

If build is running for 6 hours then it would fail if connection between Master-Slave breaks down. So solution lies in creating a custom logic to connect to slave and Hudson provides an option for this. Check this link: http://wiki.hudson-ci.org/display/HUDSON/Distributed+builds#Distributedbuilds-WriteyourownscripttolaunchHudsonslaves

A custom script with retry logic should be the way out.

Lokesh
  • 7,810
  • 6
  • 48
  • 78