3

I'm implementing a logging system that needs to encode the log messages with GZIP and send them off by UDP.

What I've got so far is:

Initialization:

DatagramSocket sock = new DatagramSocket(); 
baos = new ByteArrayOutputStream();
printStream = new PrintStream(new GZIPOutputStream(baos));

This printStream is then passed out of the logger - messages will arrive through it

Then every time a message arrives:

byte[] d = baos.toByteArray();
DatagramPacket dp = new DatagramPacket(d,d.length,host,port);
sock.send(dp);

What stumps me currently is that I can't find a way to remove the data from the ByteArrayOutputStream (toByteArray() only takes a copy) and I'm afraid that recreating all three stream objects every time will be inefficient.

Is there some way to remove sent data from the stream? Or should I look in another direction entirely?

Kristaps Baumanis
  • 563
  • 1
  • 7
  • 18

3 Answers3

1

You must create a new stream for each message; otherwise, every call to toByteArray() will send all previous messages again.

A better approach is probably to wrap the OutputStream of a TCP socket with a GZIPOutputStream:

printStream = new PrintStream(new GZIPOutputStream(sock.getOutputStream()));

Also don't forget to flush the PrintStream after every message or nothing will happen.

If speed is really that important, you should consider to use a DatagramChannel instead of the old (slow) steam API. This should get you started:

ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
OutputStreamWriter writer = new OutputStreamWriter( output, "UTF-8" );
writer.write( "log message\n" );
writer.close();

sock.getChannel().open(); // do this once
sock.getChannel().write( buffer ); // Send compressed data

Note: You can reuse the buffer by rewinding it but all the streams must be created once per message.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
1

It is worth checking that using GZIP will help if speed is important. (It will add some latency)

public static void main(String... args) throws IOException {
    test("Hello World");
    test("Nov 20, 2012 4:55:11 PM Main main\n" +
            "INFO: Hello World log message");
}

private static void test(String s) throws IOException {
    byte[] bytes = s.getBytes("UTF-8");
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    GZIPOutputStream outputStream = new GZIPOutputStream(baos);
    outputStream.write(bytes);
    outputStream.close();
    byte[] bytes2 = baos.toByteArray();
    System.out.println("'" + s + "' raw.length=" + bytes.length + " gzip.length=" + bytes2.length);
}

prints

'Hello World' raw.length=11 gzip.length=31
'Nov 20, 2012 4:55:11 PM Main main
INFO: Hello World log message' raw.length=63 gzip.length=80
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • GZIP is not my decision here. I'm sending the log messages off to Graylog which forces both GZIP and UDP on me. – Kristaps Baumanis Nov 21 '12 at 08:52
  • GZIP for compatibility is reasonable. If you can find a way to combine log entries in batches that will improve efficiency dramtaically. BTW DeflatorOutputStream has a much smaller header/footer if this is an option in the future. – Peter Lawrey Nov 21 '12 at 09:23
  • I would be concerned with running out of heap space if you call `toByteArray()` too many times. – heez Sep 04 '18 at 21:10
  • @heez it's only a problem.if you retain the arrays. Creating short lived objects is relatively cheap compared with compression. – Peter Lawrey Sep 05 '18 at 08:48
0

The answers were helpful with other aspects of my problem, but for the actual question - there is a way to clear the data from a ByteArrayOutputStream. It has a reset() method. It doesn't actually clear the buffer, but it resets the count property to 0 causing it to ignore any data already in the buffer.

Note that writing to a GZIPOutputStream after resetting the underlying ByteArrayOutputStream will cause an error, so I have not yet found a way to reuse everything.

Kristaps Baumanis
  • 563
  • 1
  • 7
  • 18