Questions tagged [flink-sql]

Apache Flink features two relational APIs, SQL and Table API, as unified APIs for stream and batch processing.

Apache Flink features two relational APIs:

  1. SQL (via Apache Calcite)
  2. Table API, a language-integrated query (LINQ) interface

Both APIs are unified APIs for stream and batch processing. This means that the a query returns the same result regardless whether it is applied on a static data set or a data stream. SQL queries are parsed and optimized by Apache Calcite (Table API queries are optimized by Calcite).

Both APIs are tightly integrated with Flink's DataStream and DataSet APIs.

667 questions
0
votes
2 answers

Does the flink 1.7.2 dataset not support kafka sink?

Does the flink 1.7.2 dataset not support kafka sink ? After doing the batch operation I need to publish the message to kafka, meaning source is my postgres and sink is my kafka. Is it possible ?
MadProgrammer
  • 513
  • 5
  • 18
0
votes
1 answer

How to check DataStream in flink is empty or having data

I am new to Apache flink i have a datastream which implements a process function if certain conditions is met then the datastream is valid and if its not meeting the conditions i am writing it to sideoutput. I am able to print the datastream is it…
YRK
  • 153
  • 1
  • 1
  • 22
0
votes
1 answer

Need some examples for the flink-sql streaming process

Need some examples for the flink-sql streaming process. Both for the kafka source and database source.
MadProgrammer
  • 513
  • 5
  • 18
0
votes
1 answer

Using ROW_NUMBER with Flink SQL

I am trying to run the following SQL statement on Flink version 1.10 select startAreaID, endAreaID from ( select startAreaID, endAreaID, ROW_NUMBER() OVER (ORDER BY cnt DESC ) as row_num from ( select startAreaID, endAreaID, count(1) as…
Ahmed Awad
  • 95
  • 8
0
votes
1 answer

why is it bad to execute Flink job with parallelism = 1?

I'm trying to understand what are the important features I need to take into consideration before submitting a Flink job. My question is what is the number of parallelism, is there an upper bound(physically)? and how can the parallelism impact the…
0
votes
1 answer

how to implement timeWindow() in Apache Flink's StreamTableEnvironment?

everyone, I want to use flink time window in StreamTableEnvironment. I have previously used the timeWindow(Time.seconds()) function with a dataStream that comes from a kafka topic. For external issues I am converting this DataStream to DataTable and…
Danieledu
  • 391
  • 1
  • 4
  • 19
0
votes
2 answers

Flink checkpointing failing in Kubernetes with FsStateBackend

I am getting the error as stated below while using flink in kubernetes with per job state backend of FsStateBackend like so -: env.setStateBackend(new FsStateBackend("file:///data/flink/checkpoints")) I am setting it in my code itself. Error…
Anish Sarangi
  • 172
  • 1
  • 14
0
votes
0 answers

StreamExecutionEnvironment is not serializable with tuple of Table in Apache Flink

i want to know if it is possible to Make a DataStream of type DataStream, Table>> with the Table type inside the tuple the table is from Table Api of Flink, i'm trying to pass the variable accumulatorTable which…
Jesus Zuñiga
  • 125
  • 6
  • 18
0
votes
1 answer

Flink stream join a dimension table which might return a large result set

I have a stream of events needs to be enriched with subscription information. Some of events are broadcasting event, means that when such events are received, I need to go the database table, find all the subscribers of the event, it can be 10,000…
0
votes
0 answers

Pass an entire Row as parameter to User-Defined Table Function in Flink Table API

How can I pass an entire Row to my ScalarFunction RowToTupleConverter in the following code? All the examples only address passing single or multiple values by name, but I want the whole result of the SELECT statement to be passed as a Row. My guess…
kopaka
  • 535
  • 4
  • 17
0
votes
0 answers

The way to get all columns used in current SQL from source tables

I intend to get the list of all columns used in current SQL from all source tables. For instance: Table X(int a, String b, String e) Table Y(String c, String d) SELECT X.a Y.c from X join Y on X.b = Y.c ... ; Then the expected output…
KAs
  • 1,818
  • 4
  • 19
  • 37
0
votes
0 answers

time-window joins to aggregate data on fixed size tumbling window in flink

I have 2 streams, one Order stream, other for booking all taxi available for booking after order was accepted and other for all taxi booked. Order Stream : Order_id : string timeCreated : Timestamp region_id : string Taxi Available Stream…
0
votes
1 answer

flink count distinct issue

Now we use tumbling window to count distinct. The issue we have is if we extend our tumbling window from day to month, We can't have the number as of now distinct count. That means if we set the tumbling window as 1 month, the number we get is from…
Jeff
  • 117
  • 10
0
votes
1 answer

Can anyway Dataset transformation-: "Distinct()" be used in Datastream in Flink?

I was wondering if in anyway Flink's datastream API be used to remove duplicates from records coming in (may be over a particular time window), just like in Dataset API which provides with a transformation called "Distinct". Or in anyway if dataset…
0
votes
2 answers

Field types of query result and registered TableSink do not match

The query result schema printed by table.printSchema(): |-- deviceId: BIGINT |-- channel: STRING |-- schemaId: BIGINT |-- productId: BIGINT |-- schema: LEGACY('RAW', 'ANY') bug when excuting…
xiemeilong
  • 643
  • 1
  • 6
  • 21