1

I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api. Thanks. Sig.

erich
  • 71
  • 6

2 Answers2

3

In general the DataStream API is very flexible when it comes to record types. POJO types might be the most convenient ones. Basically any Java class can be used but you need to check which TypeInformation is extracted via reflection. Sometimes it is necessary to manually overwrite it.

For Row you will always have to provide the types manually as reflection cannot do much based on class signatures.

GenericRowData should be avoided, it is rather an internal class with many caveats (strings must be StringData and array handling is not straightforward). Also GenericRowData becomes BinaryRowData after deserialization. TLDR This type is meant for the SQL engine.

twalthr
  • 2,584
  • 16
  • 15
  • We indeed use row for now, bu saw à lot of serializer built in for rowdat ex: parquet, orc. Thats why i was asking, in fact we build à generic solution where the user only provide function. For Pojo we observe overhead vs row. Thx – erich Aug 28 '22 at 17:47
1

The docs are actually helpful here, I was confused too.

The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData, take a look at BinaryRowData, BoxedWrapperRowData, ColumnarRowData, NestedRowData, or any of the implementations there that aren't listed as internal.

I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute

Milimetric
  • 13,411
  • 4
  • 44
  • 56
  • (btw, it looks like `BoxedWrapperRowData` allows setting non-primitive types like String, but in Flink 1.15 it's in the docs and in the source [1] but for some reason IntelliJ can't find it... Flink is so weird...) [1] https://github.com/apache/flink/blob/release-1.15/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/data/BoxedWrapperRowData.java – Milimetric Nov 01 '22 at 21:45
  • @Millimetric, in your case, how're you specifying the TypeInformation, or registering your serializer. In my case when I provide TypeHint, it gets serialized using Generic Type. – Aman Vaishya May 12 '23 at 09:15
  • 1
    I tried a few different ways and I gave up that route, so I'm sorry I can't help further. Flink is a few versions later now and my environment changed so I have new problems. Maybe GPT 4 will find a way to help us make sense of unstable environments :) – Milimetric May 23 '23 at 20:12