Questions tagged [flink-sql]

Apache Flink features two relational APIs, SQL and Table API, as unified APIs for stream and batch processing.

Apache Flink features two relational APIs:

  1. SQL (via Apache Calcite)
  2. Table API, a language-integrated query (LINQ) interface

Both APIs are unified APIs for stream and batch processing. This means that the a query returns the same result regardless whether it is applied on a static data set or a data stream. SQL queries are parsed and optimized by Apache Calcite (Table API queries are optimized by Calcite).

Both APIs are tightly integrated with Flink's DataStream and DataSet APIs.

667 questions
1
vote
1 answer

Adding a column in Flink table

I'm trying to add a new column to a flink table in Java Table table = tEnv.sqlQuery(query.getQuery()); table = table.addColumns($("NewColumn")); but I'm running into this ValidationException: org.apache.flink.table.api.ValidationException: Cannot…
shepherd
  • 33
  • 3
1
vote
0 answers

Flink listener is not getting called when using statementSet.execute

I'm using flink 1.13, using statementSet.execute but added a listener in Flink stream env, onJobSubmitted is getting called when the job is submitted (no compile issues with plan) but to bug the pipeline, I have a string field in Kafka but int…
1
vote
0 answers

Flink SQL: Unsupported type(ARRAY) to generate hash code

I am trying to use flink sql to load avro data and perform various operations. One field of the original data has the Array type, and no matter what operations I want to do, like very simply Table result = inputTable.where(or($("status").isNull(),…
tottistar
  • 11
  • 3
1
vote
1 answer

Flink CEP sql restrict output

I have a use case where I have 2 input topics in kafka. Topic schema: eventName, ingestion_time(will be used as watermark), orderType, orderCountry Data for first topic: {"eventName": "orderCreated", "userId":123, "ingestionTime": "1665042169543",…
1
vote
1 answer

Flink Windows - how to emit intermediate results as soon as new event comes in?

Flink 1.14, Java, Table API + DataStream API (toDataStream/toAppendStream). I'm trying to: read events from Kafka, hourly aggregate (sum, count, etc.) and upsert results to Cassandra as soon as new events are coming, in other words — create new…
deeplay
  • 376
  • 3
  • 20
1
vote
1 answer

Unforeseeable Tombstones messages when joining with Flink SQL

We've a SQL Flink Job (Table API) that reads Offers from a Kafka topic (8 partitions) as source and sinks it back to another Kafka topic after some aggregations with other data sources to calculate the cheapest one and aggregate extra data over that…
1
vote
0 answers

Flink SQL : How to unpack fields in ROW type as multiple columns?

I call a UDF in such a Flink SQL query: SELECT dvid, rank_name, rank_type, window_start, window_end, RankDif(rank_order,rank_pt) AS rank_cur FROM TABLE( HOP(TABLE UniqueRankTable, DESCRIPTOR(rank_pt), INTERVAL '1' DAY, INTERVAL '2'…
Singleton
  • 11
  • 1
1
vote
1 answer

How to read data from HDFS with Flink in python

I want to read data from HDFS with Flink in python I found it possible with Java or Scala : https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/dataset/formats/hadoop/ Indeed, Flink HDFS connector provides a Sink that writes…
Zak_Stack
  • 103
  • 8
1
vote
2 answers

Is it better to use Row or GenericRowData with DataStream API?

I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api. Thanks. Sig.
erich
  • 71
  • 6
1
vote
1 answer

Flink SQL Watermark Strategy After Join Operation

My problem is that I cannot use the ORDER BY clause after the JOIN operation. To reproduce the problem, CREATE TABLE stack ( id INT PRIMARY KEY, ts TIMESTAMP(3), WATERMARK FOR ts AS ts - INTERVAL '1' SECONDS ) WITH ( 'connector' =…
1
vote
1 answer

Is there a Flink Table API equivalent to Window Functions using row_number(), rank(), dense_rank()?

In an attempt to discover the possibilities and limitations of the Flink Table API for use in a current project, I was trying to translate a Flink SQL statement into its equivalent Flink Table API version. For most parts, I am able to translate the…
1
vote
0 answers

Correlated Subquery in Flink

I want to join two tables (left and right) which are generated by below queries. CREATE TABLE left_table ( `experiment_id` BIGINT, `f_sequence` BIGINT, `line_string` STRING, `log_time` TIMESTAMP(3), WATERMARK FOR log_time AS…
akurmustafa
  • 122
  • 10
1
vote
2 answers

How to determine number of task slots in flink

I am trying to determine how to devide the task slots for my flink job. To be more specific, is there a reason to use 2 task slots (or more) per task manager instead of one task slot per task manager? I read that multiple task slots per task manager…
JoeHills
  • 43
  • 4
1
vote
0 answers

In Flink table API, how do you use postgres timestamps in scan.partition.column scan.partition.lower-bound etc

In Flink 1.13, how do you configure a CREATE TABLE statement to use a postgres timestamp column to partition by? Things I have tried: In postgres, I have a column named 'my_timestamp' of type TIMESTAMP WITHOUT TIME ZONE In my Flink CREATE TABLE…
Jordan Morris
  • 2,101
  • 2
  • 24
  • 41
1
vote
1 answer

use Flink SQL multiple case when

I am using Flink SQL generate explain. select case when count(*)>1 then '11' end as query,case when src_ip='6' then '22' end as query from table but found exception,it say Expression 'src_ip' is not being grouped when I alter count(*) to…
jd g
  • 13
  • 2