I am writing to parquet file using protobuf (or Avro). my proto file looks like this:
message Log {
optional string date = 1;
optional string url = 2;
}
it is a reduced version of my problem. Now when writing to a parquet file (Non-hadoop), I want to have a case in which for one of the column there is no data at all (I am working on schema evolution). so I am just writing value for 'date' column in parquet file. File is created successfully but when I am trying to query it by apache drill, It throws a nullpointer exception. If at least for one of written records, I set 'url' field, then it is ok and all other 'url' value can be null and drill can query it. But I need the case in which in a parquet file, a whole column is null (but other parquet files have values for that column). please help me guys. my parquet version is : 1.6.0rc7 and apache drill version is 0.8.0 here is the code: LogClass is the class compiled from proto file
MessageType parquetSchema = new ProtoSchemaConverter().convert(LogClass.Log.class);`
ProtoWriteSupport writeSupport = new ProtoWriteSupport(LogClass.Log.class);
CompressionCodecName compressionCodecName = CompressionCodecName.SNAPPY;
int blockSize = 128 * 1024 * 1024;
int pageSize = 64 * 1024;
Path outputPath = new Path("./my.parquet");
ParquetWriter parquetWriter = new ParquetWriter(outputPath, writeSupport, compressionCodecName, blockSize, pageSize);`
LogClass.Log.Builder log = LogClass.Log.newBuilder();
log.setUrl("www.x.com");
for (int i=0; i < 20; i++)
parquetWriter.write(log);
parquetWriter.close();