2

how can I put tweets in avro files and save them in HDFS using Spring XD? The docu only tells me to do the following:

xd:>stream create --name mydataset --definition "time | hdfs-dataset --batchSize=20" --deploy

This works fine for the source "time" but if I want to store tweets as avro it only puts the raw json Strings in the avro files, which is pretty dumb.

I could not find any detailed information about how to tell Spring XD to apply a specific Avro Schema (avsc) or convert the json String to Tweet object.

Do I have to build a custom converter?

Can somebody please help? This is driving me insane...

Thanks.

Tim
  • 127
  • 2
  • 8

1 Answers1

1

According to the hdfs-dataset documentation, Kite SDK is used to infer the AVRO schema based on the object you passed into it. From its perspective, you passed in a String, which is why it behaves as it does. Since there is no mechanism to explicitly pick a schema for hdfs-dataset to use, you'll have to create a Java Class representative of the tweet (or use the Twitter4J api), turn the tweet JSON into a Java object (a custom processor will be necessary), and output that to your sink. Hdfs-dataset will use a schema based on your class.

Brandon McKenzie
  • 1,655
  • 11
  • 26