3

I need to read in an Avro file in Scalding but have no idea how to work with it. I have worked with straightforward avro files but this one is a little more complicated. The schema looks like this:

{"type":"record",
 "name":"features",
 "namespace":"OurCode",
 "fields":[{"name":"key","type":"long"},
       {"name":"features",
        "type":{"type":"map","values":"double"}}]
}

Not sure how to read this data when the second "field" is a nested field that contains multiple fields inside of it and when each record contains a potentially different set of nested fields.

I initially tried to read it in using UnpackAvroSource and wrote to a Tsv, but I ended up with data that looked like:

key1   {var1=4, var2 = 3, var4 = 10}
key2   {var3 = 15, var4 = 9, var5 = 22}

Also tried creating a case class:

case class FileType(var key:Long, var features:Map[String,Double])

and then tried to read it in with:

PackedAvroSource[FileType](args("input"))

I got an error that says: could not find implicit value for evidence parameter of type com.twitter.scalding.avro.AvroSchemaType[FileReader.this.FileType], whereFileReader is the name of the class where the data is being read in.

Ultimately, I need to turn the above data into something that looks like:

             Var1   Var2   Var3   Var4   Var5
Key1           1      3     0      10     0
Key2           0      0     15      9     22

So if there is a better way to do that then that would work too.

Not very experienced with scalding or avro files so any help here is appreciated. Let me know what other info I might need to provide.

Thanks.

J Calbreath
  • 2,665
  • 4
  • 22
  • 31
  • It might help if you provided some code to go with this so we could see what you're actually trying to do to read the Avro file. – Reid Spencer Jan 19 '15 at 14:40
  • The above code is all that I have been able to come up with. I have no idea where to really start. I don't really understand how to create he arguments that go into PackedAvroSource (assuming that's the right way to go). So haven't been able to diagnose the error message I am getting. – J Calbreath Jan 19 '15 at 15:20
  • What is the error message you're getting? – Reid Spencer Jan 19 '15 at 18:07
  • See above in the original post. – J Calbreath Jan 19 '15 at 18:17
  • Okay, you need to provide an implicit AvroSchemaType object in the context where you're doing the reading. The reader takes an implicit argument and Scala is looking for an implicit declaration to satisfy it. Alternatively you can just instantiate one and pass it as an additional argument list. – Reid Spencer Jan 19 '15 at 20:28
  • Can you provide a specific code example on creating the AvroSchemaType object? Is PackedAvroSource the right way to try and read this file? If so, which parameter there is the AvroSchemaType object? – J Calbreath Jan 21 '15 at 15:49
  • Sorry, I don't have time for that. – Reid Spencer Jan 22 '15 at 20:54

0 Answers0