1

I apologise, I'm relatively new to Java. I'm writing a library to send logs from Java applications to Logstash using zeromq (jeromq).

One of my test servers is a busy Jenkins master running in Tomcat. My library, "Logit" (https://github.com/stuart-warren/logit), sprays its JSON formatted logs (JUL) over a configured list of Logstash endpoints (for some basic load-balancing) using the PUSHPULL sockets for speed.

Unfortunately, after a few days it hits an exception. Tomcat/Jenkins continue working and writing logs to their normal files, but Logit stops sending messages over the network.

18-Nov-2013 14:45:24 hudson.model.Run execute
INFO: EmployeeManagementSystems » Project Phase 2 branch » Int Tests » Sync Int Tests (Project - Prl) #326 aborted
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at hudson.remoting.Request.call(Request.java:146)
        at hudson.remoting.Channel.call(Channel.java:714)
        at hudson.maven.ProcessCache$MavenProcess.call(ProcessCache.java:156)
        at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:815)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:565)
        at hudson.model.Run.execute(Run.java:1592)
        at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:508)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:237)
logit:WARN IOException thrown, will try sending log again shortly.
zmq.ZError$IOException: java.nio.channels.ClosedByInterruptException
        at zmq.Signaler.send(Signaler.java:108)
        at zmq.Mailbox.send(Mailbox.java:90)
        at zmq.Ctx.send_command(Ctx.java:351)
        at zmq.ZObject.send_command(ZObject.java:364)
        at zmq.ZObject.send_activate_read(ZObject.java:217)
        at zmq.Pipe.flush(Pipe.java:284)
        at zmq.LB.send(LB.java:120)
        at zmq.Push.xsend(Push.java:64)
        at zmq.SocketBase.send(SocketBase.java:598)
        at org.jeromq.ZMQ$Socket.send(ZMQ.java:932)
        at com.stuartwarren.logit.zmq.ZmqTransport.appendString(ZmqTransport.java:115)
        at com.stuartwarren.logit.jul.ZmqAppender.publish(ZmqAppender.java:77)
        at java.util.logging.Logger.log(Logger.java:481)
        at java.util.logging.Logger.doLog(Logger.java:503)
        at java.util.logging.Logger.log(Logger.java:592)
        at hudson.model.Run.execute(Run.java:1610)
        at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:508)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:237)
Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
        at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:154)
        at zmq.Signaler.send(Signaler.java:106)
        ... 18 more
logit:ERROR Logit got interrupted while waiting to send failed message again.
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at com.stuartwarren.logit.zmq.ZmqTransport.appendString(ZmqTransport.java:121)
        at com.stuartwarren.logit.jul.ZmqAppender.publish(ZmqAppender.java:77)
        at java.util.logging.Logger.log(Logger.java:481)
        at java.util.logging.Logger.doLog(Logger.java:503)
        at java.util.logging.Logger.log(Logger.java:592)
        at hudson.model.Run.execute(Run.java:1610)
        at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:508)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:237)
18-Nov-2013 14:45:35 hudson.model.Run execute

I'm using the same version of my library logging from a test 6 node Cassandra cluster without issue (log4j), but I guess that could just be luck?

My question is, how should one handle exceptions like above when sending messages? https://github.com/stuart-warren/logit/blob/master/src/main/java/com/stuartwarren/logit/zmq/ZmqTransport.java#L115.

public void appendString(final String line) {
    final String log = line.substring(0, line.length() - 1);
    if (LogitLog.isTraceEnabled()) {
        LogitLog.trace("Sending log: [" + log + "].");
    }
    try {
        socket.send(log, ZMQ.NOBLOCK);
        // Has occasionally been known to throw a java.nio.channels.ClosedByInterruptException
    } catch (IOException e) {
        LogitLog.warn("IOException thrown, will try sending log again shortly.", e);
        // Try again after sleeping for a second
        try {
            Thread.sleep(1000);
            socket.send(log, ZMQ.NOBLOCK);
        } catch (InterruptedException i) {
            LogitLog.error("Logit got interrupted while waiting to send failed message again.", i);
        } catch (IOException e2) {
            LogitLog.error("Could not send following log on the second attempt: [" + log + "].", e2);
        }
    } catch (Exception g) {
        LogitLog.error("Something threw an exception that wasn't IOException.", g);
    }
}

Should I close and reopen the sockets/context, or should this possibly be handled in the Jeromq library? Perhaps Tomcat is more likely to throw these and it should be handled in my JUL appender?

Thanks.

stuart-warren
  • 605
  • 5
  • 14

1 Answers1

0

Looks like your answer could be found HERE in this thread. #cafebabe added some more information to their answer in a comment below it, that more directly answers your question here:

"If the application is multi-threaded, you should look for #interrupt()-calls that might interrupt the thread doing the IO operations on the channel. If this is a web-application or some other kind of managed environment, where thread management isn't up to your application (like a Servlet / EJB Container), you should look for thread-safety violations. Another place to look is when the application is shutting down or when thread pooling is used (Servlet/EJB Container!). Then be aware of dynamic management of the pool size!"

Community
  • 1
  • 1
S.Richmond
  • 11,412
  • 6
  • 39
  • 57