If I am understanding the GzipCodec
class correctly, its purpose is to create various compressor and decompressor streams and return them to the caller. It is not responsible for closing those streams. That is the responsibility of the caller.
How to close a GzipOutputStream
?
You simply call close()
on the object. If saveAsHadoopFile
is using GzipCodec
to create a GzipOutputStream
, then that method is responsible for closing it.
Or other stream should also be closed?
The same as for a GzipOutputStream
. Call close()
on it.
Is there a good alternative?
To calling close explicitly?
As an alternative, you could manage a stream created by GzipCodec
using try with resources.
But if you are asking if there is a way to avoid managing the streams properly, then the answer is No.
If you are actually encountering a storage leak that is (you think) due to saveAsHadoopFile
not closing the streams that it opens, please provide a minimal reproducible example that we can look at. It could be a bug in Hadoop ... or you could be using it incorrectly.