3

In the following usage example of scalaz-stream (taken from the documentation), what do I need to change if the input and/or output is a gzipped file? In other words, how do I use compress?

import scalaz.stream._
import scalaz.concurrent.Task

val converter: Task[Unit] =
  io.linesR("testdata/fahrenheit.txt")
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .intersperse("\n")
    .pipe(text.utf8Encode)
    .to(io.fileChunkW("testdata/celsius.txt"))
    .run

// at the end of the universe...
val u: Unit = converter.run
mitchus
  • 4,677
  • 3
  • 35
  • 70

1 Answers1

5

Compressing the output is easy. Since compress.deflate() is a Process1[ByteVector, ByteVector] you need to plug it into your pipeline where you are emitting ByteVectors (that is right after text.utf8Encode which is a Process1[String, ByteVector]):

val converter: Task[Unit] =
  io.linesR("testdata/fahrenheit.txt")
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .intersperse("\n")
    .pipe(text.utf8Encode)
    .pipe(compress.deflate())
    .to(io.fileChunkW("testdata/celsius.zip"))
    .run

For inflate you can't use io.linesR to read the compressed file. You need a process that produces ByteVectors instead of Strings in order to pipe them into inflate. (You could use io.fileChunkR for that.) The next step would be decoding the uncompressed data to Strings (with text.utf8Decode for example) and then using text.lines() to emit the text line by line. Something like this should do the trick:

val converter: Task[Unit] =
  Process.constant(4096).toSource
    .through(io.fileChunkR("testdata/fahrenheit.zip"))
    .pipe(compress.inflate())
    .pipe(text.utf8Decode)
    .pipe(text.lines())
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .intersperse("\n")
    .pipe(text.utf8Encode)
    .to(io.fileChunkW("testdata/celsius.txt"))
    .run
Frank S. Thomas
  • 4,725
  • 2
  • 28
  • 47
  • Thanks for the reply. If I use the above on a gzip file, I get `java.util.zip.DataFormatException: incorrect header check`, and if I use `inflate(true)` instead I get `java.util.zip.DataFormatException: invalid block type` – mitchus May 05 '15 at 09:27
  • A gzip file contains extra headers which `infalte` (or `java.zip.Inflater` which is used by `inflate`) does not understand, see http://en.wikipedia.org/wiki/Gzip . `inflate` can only handle the DEFLATE-compressed payload – Frank S. Thomas May 05 '15 at 09:40
  • If you want to read from a gzip file you are better of with `io.linesR(in: => InputStream)` and Java's `GZIPInputStream` – Frank S. Thomas May 05 '15 at 09:45