1

Getting this error:- Can not read value at 0 in block -1 in file file: localdirectory/samplefile.parquet.

I have to read a directory containing parquet file from s3 bucket. For this, I am downloading the directory from s3 in local and reading it in stand alone java project. Below code is used for downloading...

MultipleFileDownload multipleFileDownload = transferManager.downloadDirectory(bucketName, keyPrefix, directoryToSaveParquetFiles);
            

I, then tried to read parquet files from local directory. Code used to read parquet file.

Configuration conf = new Configuration();
conf.set("parquet.avro.readInt96AsFixed", "true");
ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(new Path(filePath)).withConf(conf).build();
GenericRecord obj = reader.read(); 
while (obj != null) {
  //read attributes
  obj = reader.read();
}

Used maven dependency

import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.TransferManagerBuilder;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.commons.io.FileUtils;

Used sdk in project

11 Amazon Corretto version 11.0.11 

Works perfectly fine in local machine(I'm using Intellij IDEA). But when deployed to AWS ECS, getting this error

Can not read value at 0 in block -1 in file file: localdirectory/samplefile.parquet
mold_9580
  • 11
  • 2
  • 1
    It is not a file... A `java.io.File` has to be a physical resource on the filesystem. It isn't it is loaded from a jar. Hence you need to use `getResourceAsStream` or `getResource` to load it. – M. Deinum Sep 26 '22 at 14:29
  • I suggest you use Spark/Flink for reading parquet data from S3 – OneCricketeer Sep 27 '22 at 13:30

0 Answers0