0

Similar to KSQL streams - Get data from Array of Struct, my input JSON looks like:

{
  "Obj1": {
    "a": "abc",
    "b": "def",
    "c": "ghi"
  },
  "ArrayObj": [
    {
      "key1": "1",
      "key2": "2",
      "key3": "3"
    },
    {
      "key1": "4",
      "key2": "5",
      "key3": "6"
    },
    {
      "key1": "7",
      "key2": "8",
      "key3": "9"
    }
  ]
}

I have created a stream with:

CREATE STREAM Example1(Obj1 STRUCT<a VARCHAR, b VARCHAR, c VARCHAR>, ArrayObj ARRAY<STRUCT<key1 VARCHAR, key2 VARCHAR, key3 VARCHAR>>) WITH (kafka_topic='sample_topic', value_format='JSON', partitions=1);

However, I would like only a single row of output from each input JSON document, with the data from each column in the array flattened into arrays, like:

 a    b   key1      key2      key3

 abc  def [1, 4, 7] [2, 5, 8] [3, 6, 9]

Is this possible with KSQL?

2 Answers2

0

At present you can only flatten ArrayObj in the way you want if you know up front how many elements it will have:

CREATE STREAM flatten AS
  SELECT
    Obj1.a AS a,
    Obj1.b AS b,
    ARRAY[ArrayObj[1]['key1'], ArrayObj[2]['key1'], ArrayObj[3]['key1']] as key1,
    ARRAY[ArrayObj[1]['key2'], ArrayObj[2]['key2'], ArrayObj[3]['key2']] as key2,
    ARRAY[ArrayObj[1]['key3'], ArrayObj[2]['key3'], ArrayObj[3]['key3']] as key3,
  FROM Example1;

I guess if you new the array was going to be up to a certain size you could just a case statement to selectively extract the elements, e.g.

-- handles arrays of size 2 or 3 elements, i.e. third element is optional.
CREATE STREAM flatten AS
  SELECT
    Obj1.a AS a,
    Obj1.b AS b,
    ARRAY[ArrayObj[1]['key1'], ArrayObj[2]['key1'], ArrayObj[3]['key1']] as key1,
    ARRAY[ArrayObj[1]['key2'], ArrayObj[2]['key2'], ArrayObj[3]['key2']] as key2,
    CASE
      WHEN ARRAY_LENGTH(ArrayObj) >= 3)
        THEN ARRAY[ArrayObj[1]['key3'], ArrayObj[2]['key3'], ArrayObj[3]['key3']] 
      ELSE
        null
      as key3,
  FROM Example1;

If that doesn't suit your needs then the design discussion going on at the moment around lambda function support in ksqlDB may be of interest: https://github.com/confluentinc/ksql/pull/5661

Andrew Coates
  • 1,775
  • 1
  • 10
  • 16
0

this should do the trick.

SELECT 
  `Obj1`->`a`, 
  `Obj1`->`b`, 
  transform(`ArrayObj`, (x) => x->`key1`) AS KEY1,
  transform(`ArrayObj`, (x) => x->`key2`) AS KEY2,
  transform(`ArrayObj`, (x) => x->`key3`) AS KEY3
FROM your_original_stream;
wallstalk
  • 61
  • 3