22

I am trying to convert a Json string into a generic Java Object, with an Avro Schema.

Below is my code.

String json = "{\"foo\": 30.1, \"bar\": 60.2}";
String schemaLines = "{\"type\":\"record\",\"name\":\"FooBar\",\"namespace\":\"com.foo.bar\",\"fields\":[{\"name\":\"foo\",\"type\":[\"null\",\"double\"],\"default\":null},{\"name\":\"bar\",\"type\":[\"null\",\"double\"],\"default\":null}]}";

InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);

Schema schema = Schema.parse(schemaLines);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);

I get "org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_FLOAT" Exception.

The same code works, if I don't have unions in the schema. Can someone please explain and give me a solution.

Princey James
  • 708
  • 1
  • 6
  • 13
  • From http://avro.apache.org/docs/1.7.6/spec.html#json_encoding, I understand that Json encoding for unions is different, but I am trying figure out if there is any way, by which I can convert the json string to object. – Princey James Dec 19 '14 at 04:17
  • 2
    FYI, an overload of `jsonDecoder()` accepts a json String; there is no need to convert it into a Stream. – jaco0646 Jul 11 '16 at 19:44

6 Answers6

17

For anyone who uses Avro - 1.8.2, JsonDecoder is not directly instantiable outside the package org.apache.avro.io now. You can use DecoderFactory for it as shown in the following code:

String schemaStr = "<some json schema>";
String genericRecordStr = "<some json record>";
Schema.Parser schemaParser = new Schema.Parser();
Schema schema = schemaParser.parse(schemaStr);
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, genericRecordStr);
DatumReader<GenericData.Record> reader =
            new GenericDatumReader<>(schema);
GenericRecord genericRecord = reader.read(null, decoder);
Raman
  • 1,221
  • 13
  • 20
13

Thanks to Reza. I found this webpage. It introduces how to convert a Json string into an avro object.

http://rezarahim.blogspot.com/2013/06/import-org_26.html

The key of his code is:

static byte[] fromJsonToAvro(String json, String schemastr) throws Exception {
  InputStream input = new ByteArrayInputStream(json.getBytes());
  DataInputStream din = new DataInputStream(input);

  Schema schema = Schema.parse(schemastr);

  Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

  DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
  Object datum = reader.read(null, decoder);

  GenericDatumWriter<Object>  w = new GenericDatumWriter<Object>(schema);
  ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

  Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);

  w.write(datum, e);
  e.flush();

  return outputStream.toByteArray();
}

String json = "{\"username\":\"miguno\",\"tweet\":\"Rock: Nerf paper, scissors is fine.\",\"timestamp\": 1366150681 }";

String schemastr ="{ \"type\" : \"record\", \"name\" : \"twitter_schema\", \"namespace\" : \"com.miguno.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\"  : \"Name of the user account on Twitter.com\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\"  : \"The content of the user's Twitter message\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\"  : \"Unix epoch time in seconds\" } ], \"doc:\" : \"A basic schema for storing Twitter messages\" }";

byte[] avroByteArray = fromJsonToAvro(json,schemastr);

Schema schema = Schema.parse(schemastr);
DatumReader<Genericrecord> reader1 = new GenericDatumReader<Genericrecord>(schema);

Decoder decoder1 = DecoderFactory.get().binaryDecoder(avroByteArray, null);
GenericRecord result = reader1.read(null, decoder1);
Derlin
  • 9,572
  • 2
  • 32
  • 53
Liang
  • 361
  • 3
  • 9
  • 7
    This code won't solve the problem. This doesn't work when schema contains unions. – Deepak Dec 14 '16 at 08:48
  • 3
    Any solution for when my schema contains unions? I get `Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING`... – anon Jul 27 '18 at 11:50
9

With Avro 1.4.1, this works:

private static GenericData.Record parseJson(String json, String schema)
    throws IOException {
  Schema parsedSchema = Schema.parse(schema);
  Decoder decoder = new JsonDecoder(parsedSchema, json);

  DatumReader<GenericData.Record> reader =
      new GenericDatumReader<>(parsedSchema);
  return reader.read(null, decoder);
}

Might need some tweaks for later Avro versions.

Valloric
  • 2,983
  • 2
  • 20
  • 10
2

As it was already mentioned here in the comments, JSON that is understood by AVRO libs is a bit different from a normal JSON object. Specifically, UNION type is wrapped into a nested object structure: "union_field": {"type": "value"}.

So if you want to convert "normal" JSON to AVRO you'll have to use 3rd-party library. For now at least.

user3468781
  • 101
  • 5
1

Your schema does not match the schema of the json string. You need to have a different schema that does not have a union in the place of the error but a decimal number. Such schema should then be used as a writer schema while you can freely use the other one as the reader schema.

miljanm
  • 906
  • 7
  • 20
  • Alternatively, tell Avro which one you're using, like this: `String json = "{\"foo\":{\"double\":30.1},\"bar\":{\"double\":60.2}}";` – Keegan Jan 09 '15 at 19:25
  • That would be the way avro would serialize the record with the given schema. – miljanm Jan 12 '15 at 09:05
  • Thanks Miljanm and Keegan. Yes, I understand that json encoding for unions is different from avro.apache.org/docs/1.7.6/spec.html#json_encoding. But I was looking for an open source library, that can internally change my json string to the avro specific schema then parse it. is something like that available? – Princey James Jan 13 '15 at 06:04
  • I'm not aware of any such tool. Why don't you want to make a different schema? It seems like much easier solution to this problem, and the schema would be compatible with the current one. – miljanm Jan 13 '15 at 12:06
1

Problem is not the code, but the wrong format of the json

String json = "{"foo": {"double": 30.1}, "bar": {"double": 60.2}}";

Shakti Garg
  • 263
  • 3
  • 15