3

I am writing to parquet file using protobuf (or Avro). my proto file looks like this:

message Log  {
    optional string date = 1;
    optional string url = 2;
}

it is a reduced version of my problem. Now when writing to a parquet file (Non-hadoop), I want to have a case in which for one of the column there is no data at all (I am working on schema evolution). so I am just writing value for 'date' column in parquet file. File is created successfully but when I am trying to query it by apache drill, It throws a nullpointer exception. If at least for one of written records, I set 'url' field, then it is ok and all other 'url' value can be null and drill can query it. But I need the case in which in a parquet file, a whole column is null (but other parquet files have values for that column). please help me guys. my parquet version is : 1.6.0rc7 and apache drill version is 0.8.0 here is the code: LogClass is the class compiled from proto file

MessageType parquetSchema = new   ProtoSchemaConverter().convert(LogClass.Log.class);`

ProtoWriteSupport writeSupport = new  ProtoWriteSupport(LogClass.Log.class);

CompressionCodecName compressionCodecName = CompressionCodecName.SNAPPY;

int blockSize = 128 * 1024 * 1024;
int pageSize = 64 * 1024;

Path outputPath = new Path("./my.parquet");

ParquetWriter parquetWriter = new ParquetWriter(outputPath,    writeSupport, compressionCodecName, blockSize, pageSize);`

LogClass.Log.Builder log = LogClass.Log.newBuilder();
log.setUrl("www.x.com");

for (int i=0; i < 20; i++)
    parquetWriter.write(log);
parquetWriter.close();
Dev
  • 13,492
  • 19
  • 81
  • 174
Masood
  • 31
  • 1

0 Answers0