1

I have questions about Kudu with nested fields.

I have JSON from Kafka like this:

{
  "ts": 32,
  "status": "success",
  "uid": "3232",
  "url": "http://some_url",
  "syncpixel": "http://some_url",
  "dfp": {
    "DFP_UABrowser": "Chrome 61",
    "DFP_UAOperatingSystem": "Windows 7 ver.7.0",
    "JavascriptDisplayData_Screen_W_x_H": "1440 x 900",
    "Native_client": true
  }
}

dfp field has a nested object, I want to insert this object to kudu through Flume

I know that kudu does not support nested field, and supported binary column. What do I need to do?

  1. Convert field dfp to binary format and read for example scala spark?
  2. Turn JSON in flatten format (but in many cases is not best issue, something like streaming product purchase with product id, name and other or products view in page).
Mika Sundland
  • 18,120
  • 16
  • 38
  • 50

1 Answers1

0

If you use spark/scala streaming will not be and issue when you have proper setup cluster. Read the entire json through spark and use "explode" function to flatten the json. This will make life easier.

swapnil shashank
  • 877
  • 8
  • 11