1

I'm using Java-Spark, I'm get message from Kafka topic that indicate on zip file path, I want to take this zip file and to extract it to HDFS.

I have code that read messages from Kafka with Spark Structured Stream.

What is the way to extract the files to HDFS?

I'm using ZipFile from net.lingala.zip4j.core.ZipFile as follow:

ZipFile zipFile = new ZipFile(pathFromKafka);
zipFile.extractAll("?");//What should I write here?
Ya Ko
  • 509
  • 2
  • 4
  • 19

1 Answers1

0

ZipFile doesn't allow you to extract files to the HDFS You can extract files to the local file system and then put these file into HDFS:

//imports required 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;

//some class here .....
Configuration conf = new Configuration();
conf.set("fs.defaultFS", <hdfs write endpoint>);
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(<src>, <dst>);
Yehor Krivokon
  • 837
  • 5
  • 17