I have some data in tsv format compressed using lzo. Now, I would like to use these data in a java spark program.
At the moment, I am able to decompress the files and then import them in Java as text files using
SparkSession spark = SparkSession.builder()
.master("local[2]")
.appName("MyName")
.getOrCreate();
Dataset<Row> input = spark.read()
.option("sep", "\t")
.csv(args[0]);
input.show(5); // visually check if data were imported correctly
where I have passed the path to the decompressed file in the first argument. If I pass the lzo file as an argument, the result of show is illegible garbage.
Is there a way to make it work? I use IntelliJ as an IDE and the project is set-up in Maven.