2

I get socket bind error on Java 6/Websphere 8.5 (Liberty profile, a cut down, usable version of Websphere). When killing and starting app server immediately again I get:

[ERROR ] CWWKO0221E: TCP Channel defaultHttpEndpoint initialization did not succeed. The socket bind did not succeed for host * and port 9988. The port might already be in use.

This is because either Java or WAS have not released IPv6 sockets properly.

But, here's the snag: when I run WLP via strace (with -f option to track child processes), the bind error does not happen.

What is going on? Why can't I catch this via strace?

I can get around this problem by specifying soReuseAddress, but what worries me here is why / how to catch this problem via strace (without relying on dumb luck, that is) and why it's not working?

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
LetMeSOThat4U
  • 6,470
  • 10
  • 53
  • 93
  • 1
    WebSphere is nothing but heisenbugs and inexplicable errors. Which is why you'll see so comparatively few answers to WebSphere-related questions on stackoverflow - no one really understands it to begin with. – pap Dec 18 '12 at 14:45
  • 3
    I think it would be more fair to say there are comparatively few questions (and thus answers) because IBM has its own forum for answering questions about its products. http://www.ibm.com/developerworks/forums/forum.jspa?forumID=266&cat=9 – Nick Roth Dec 18 '12 at 15:12
  • @pap: I do understand that WAS is what you said and more. However, regardless of what the process X is (WLP/Java in this case) I cannot understand why using strace to catch that error makes it disappear. It would be fair to say that behavior of strace here worries me more than Websphere itself. That's what I want to know: why using strace makes this bug disappear. – LetMeSOThat4U Dec 19 '12 at 10:20
  • 2
    How much slower is the server to start with `strace`? Is it possible that it causes enough of a delay for the old socket to be fully closed? – Brett Kail Dec 19 '12 at 15:42
  • What platform are you running on? Platform behaviour is different in this area. – Holly Cummins Dec 20 '12 at 12:19
  • @bkail: hmm this might be since startup with strace is like tens of seconds. It's hard to nail it. – LetMeSOThat4U Dec 20 '12 at 18:01
  • @HollyCummins: RHEL 5.4, RHEL 6 also has this problem. Strangely, no other UNIX* has this problem. – LetMeSOThat4U Dec 20 '12 at 18:46

1 Answers1

3

You may find adding the soReuseAddr option to your httpEndpoint configuration helps, particularly on Linux platforms. For example,

<httpEndpoint id="defaultHttpEndpoint"
             host="*"
             httpPort="9080">
      <tcpOptions soReuseAddr="true" />
 </httpEndpoint>

It can take a while for the OS to release ports, despite the best attempts of the server, and this is particularly noticeable with Liberty, since it tends to bounce quickly.

Community
  • 1
  • 1
Holly Cummins
  • 10,767
  • 3
  • 23
  • 25