how to read lz4 compressed data using fileStream of spark streaming?

Asked May 13 '15 at 13:48

Active May 14 '15 at 02:53

Viewed 1,025 times

in spark streaming, I want to use fileStream to monitor a directory. But the files in that directory are compressed using lz4. So the new lz4 files are not detected by the following code. How to detect these new files?

val list_join_action_stream = ssc.fileStream[LongWritable, Text, TextInputFormat](gc.input_dir, (t: Path) => true, false).map(_._2.toString)

I know the textFile function could read .lz4 format data. But I'm using spark streaming with fileStream function...

edited May 14 '15 at 02:53

asked May 13 '15 at 13:48

user2848932

Are the files in the input directory named with the `.lz4` extensions? – vanekjar May 13 '15 at 17:22
possible duplicate of [Decompressing LZ4 compressed data in Spark](http://stackoverflow.com/questions/24985704/decompressing-lz4-compressed-data-in-spark) – vanekjar May 13 '15 at 17:23
yes, the files in the input dir are named with .lz4 ext – user2848932 May 14 '15 at 02:51
@vanekjar i 'm using fileStream in spark streaming, the question you give me is using textFile. – user2848932 May 14 '15 at 02:54
Spark uses Hadoop input format for reading files. So `.textFile` and `.fileStream` with `TextInputFormat` should be the same. Hadoop should handle the input compression transparently. What is your Hadoop version? – vanekjar May 14 '15 at 09:37
@vanekjar my hadoop version is hadoop 2.6, my experiment shows they are different... – user2848932 May 14 '15 at 12:26
@user2848932, did you find solution to spark stream .lz4 files? If so, can you please share the details. I'm having the similar streaming challenge, but with .ORC files. – Sudheer Palyam Apr 25 '17 at 06:04

how to read lz4 compressed data using fileStream of spark streaming?

0 Answers0