Questions tagged [flink-streaming]

Apache Flink is an open source platform for scalable batch and stream data processing. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

Flink's streaming API provides rich semantics, including processing- and event-time windows, as well as stateful UDFs. Flink streaming uses a light-weight fault-tolerance mechanism with exactly-once processing guarantees.

Learn more about Apache Flink at the project website: https://flink.apache.org/

3185 questions
7
votes
1 answer

Apache Flink example job fails to run with "Job not found"

Attempting to run the SocketWindowWordCount example tutorial found on the flink site here. I started the flink cluster, then ran a local socket server: nc -l 9000 After compiling the example source taken from github, I run the job flink run…
marathon
  • 7,881
  • 17
  • 74
  • 137
7
votes
1 answer

How to fix: java.lang.OutOfMemoryError: Direct buffer memory in flink kafka consumer

We Are running a 5 node flink cluster (1.6.3) over kubernetes, with a 5 partitions Kafka topic source. 5 jobs are reading from that topic (with different consumer group), each with parallelism = 5. Each task manager is running with 10Gb of ram and…
7
votes
2 answers

Apache Flink: My application does not resume from a checkpoint when I restart it

I have a Flink job in which I am reading files from a folder and dumping it in database. New files will come in that folder daily. I have enabled checkpointing so that if for any reason Flink job stops and I need to restart, Flink job should not…
Ankit
  • 297
  • 1
  • 4
  • 15
7
votes
3 answers

Apache flink on Kubernetes - Resume job if jobmanager crashes

I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly. A crashing jobmanager…
7
votes
2 answers

Flink: How to convert the deprecated fold to aggregrate?

I am following the quick start example of Flink: Monitoring the Wikipedia Edit Stream. The example is in Java, and I am implementing it in Scala, as following: /** * Wikipedia Edit Monitoring */ object WikipediaEditMonitoring { def main(args:…
fluency03
  • 2,637
  • 7
  • 32
  • 62
7
votes
1 answer

Akka version collision between Flink and Play 2.5

In our project we have a Flink (1.1.3) streaming job that reads from one kafka queue, performs a map function transformation and writes to another queue. This was working well until we introduced an outgoing REST request as part of the flow. To do…
7
votes
1 answer

Flink Streaming: How to implement windows which are defined by a start and end element?

I have data in the following format, SIP|2405463430|4115474257|8.205142580136622E12|Tue Nov 08 16:58:58 IST 2016|INVITE RTP|2405463430|4115474257|8.205142580136622E12|Tue Nov 08 16:58:58 IST 2016|0…
7
votes
2 answers

ClassNotFoundException: org.apache.flink.streaming.api.checkpoint.CheckpointNotifier while consuming a kafka topic

I am using the latest Flink-1.1.2-Hadoop-27 and flink-connector-kafka-0.10.2-hadoop1 jars. Flink consumer is as below: StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment(); if (properties == null) { …
mrinal
  • 375
  • 3
  • 15
7
votes
2 answers

Kafka & Flink duplicate messages on restart

First of all, this is very similar to Kafka consuming the latest message again when I rerun the Flink consumer, but it's not the same. The answer to that question does NOT appear to solve my problem. If I missed something in that answer, then please…
mbarlocker
  • 1,310
  • 10
  • 16
7
votes
1 answer

How to Handle Application Errors in Flink

I am currently wondering how to handle application errors in Apache Flink streaming applications. In general, I see two cases: Transient errors, where you want the input data to be replayed and processing might succeed on second try. An example…
F30
  • 1,036
  • 1
  • 10
  • 21
7
votes
1 answer

Get JSON elements from a web with Apache Flink

After reading several documentation pages of Apache Flink (official documentation, dataartisans) as well as the examples provided in the official repository, I keep seeing examples where they use as the data source for streamming a file already…
Alvaro Gomez
  • 350
  • 2
  • 7
  • 22
6
votes
0 answers

RuntimeError: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed

I am using pyflink 1.17.1 and i am getting this error "RuntimeError: java.lang.UnsupportedOperationException: A serializer has already been registered for the state; re-registration is not allowed". Need your help with this. when i try to sink data…
6
votes
1 answer

Jobs stuck while trying to restart from a checkpoint

Context We are using Flink to run a number of streaming jobs that read from Kafka, perform some SQL transformation and write the output to Kafka. It runs on Kubernetes with two jobmanagers and many taskmanagers. Our jobs use checkpointing with…
Colin Smetz
  • 131
  • 7
6
votes
1 answer

How to convert a Table to a DataStream containing array types (Flink)?

I have issues concerning the table-api of Flink (1.13+). I have a POJO containing several fields, one of them being: List my_list; I create my table using the following declaration for this field: "CREATE TABLE my_table ( ... my_list…
Fray
  • 173
  • 6
6
votes
4 answers

Flink 1.13.2: NoResourceAvailableException

This is with Flink 1.13.2 running in Amazon's Kinesis Data Analytics Flink environment. This application is running on Kafka topics. When the topics had smaller traffic volumes, this application ran fine, with larger volumes, I'm getting this error.…
clay
  • 18,138
  • 28
  • 107
  • 192