Questions tagged [flink-batch]

158 questions
0
votes
1 answer

The configuration does not specify the checkpoint directory 'state.checkpoints.dir'

While submitting the flink job on the dataproc cluster getting the below error. Please find the code base and the error. I am using the flink 1.9.3 version. The program finished with the following…
0
votes
1 answer

Flink: TaskManager cannot connect to the JobManager - Could not resolve ResourceManager address

I'm using the Apache Flink Kubernetes operator to deploy a standalone job on an Application cluster setup. I have setup the following files using the Flink official documentation -…
user1386101
  • 1,945
  • 1
  • 14
  • 21
0
votes
0 answers

How tasks are exactly distributed among threads/task-slots in Apache-Flink

i am new to Flink, as part of a research I am trying to figure out : 1-How exactly Flink(am using Dataset API and just one machine) is distributing the tasks among available threads/slots, which algorithms or techniques are being used ? 2- Does…
Mahmoud
  • 13
  • 3
0
votes
0 answers

Two flink jobs running in one application result in first to complete and second to fail with NPE

I have two flink jobs in one Application: 1)First is flink batch job that sends events to kafka, which is then written by someone else to s3 2)Second is flink batch job that checks generated data(reads s3). Considerations. These 2 jobs work fine…
0
votes
1 answer

Finding missing records from 2 data sources with Flink

I have two data sources - an S3 bucket and a postgres database table. Both sources have records in the same format with a unique identifier of type uuid. Some of the records present in the S3 bucket are not part of the postgres table and the intent…
davyjones
  • 185
  • 15
0
votes
0 answers

How to use 2 Tumble function in an UNION ALL query in flink

I have 2 different aggregation queries running in BATCH mode in flink Query 1 : SELECT TUMBLE_END(trunc_time, INTERVAL '10' MINUTE) trunc_time, organization_id, cluster_safe_name, max(peak_cpu) AS peak_cpu, avg(average_cpu) AS average_cpu FROM…
Kush Rohra
  • 15
  • 5
0
votes
1 answer

Apache Flink trigger that fires when state size threshold is reached

I would like to implement an apache flink trigger that will fire when the state accumulates 256MB. I would like to do this because my sink is writing parquet files to hdfs and i would like to run ETL on them later, which means I don’t want too small…
0
votes
0 answers

How batch processing over multiple loops work in Apache Flink?

I wanted to clear my understanding on the following. Use case Basically I am running a flink batch job. My requirement is following I have 10 tables having raw data in postgresql I want to aggregate that data by creating a tumble window of 10…
0
votes
2 answers

can flink operator support batch job with ApplicationCluster?

when the batch job finish, what will the ApplicationCluster state suppose to be? Is 'increase restartNonce' a by designed way to re-run the job? i am trying to use flink operator to deploy a flink batch job, and trigger it with a kubernetes cronjob…
gix
  • 3
  • 2
0
votes
2 answers

Flink Window Aggregation using TUMBLE failing on TIMESTAMP

We have one table A in database. We are loading that table into flink using Flink SQL JdbcCatalog. Here is how we are loading the data val catalog = new JdbcCatalog("my_catalog", "database_name", username, password,…
Kush Rohra
  • 15
  • 5
0
votes
1 answer

Flink - How to write Dataset to orc file?

Is there anyway to write Dataset object to ORC file? I know a Dataset object can be written as avro file by using AvroOutputFormat, but looks like there is no equivalent class for orc? If that can not be achieved, is there any way to convert Dataset…
tottistar
  • 11
  • 3
0
votes
1 answer

Can window operator be used in flink batch mode?

I have a program that contains window operator. It works perfectly in streaming mode. However, when I switch to batch mode, window is not emitted. My question is: Is it due to watermark not advanced in batch mode? How can I use window operator in…
0
votes
1 answer

exception when running pyFlink batch processing with 2 sinks

I got the following flink exception when I run pyflink processing job: Exception in thread read_grpc_client_inputs: Traceback (most recent call last): File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner self.run() File…
0
votes
1 answer

Using Async I/O in Flink to call paginated HTTP API

My use case is that I have a paginated api , like http://someurl.com/next=abc , here next is a pointer to the next set of records. The api will return a pointer to the next set of records in the response, then I need to use that and pass in the next…
0
votes
1 answer

Flink batch job not reading from kafka with timestamp

I have a Flink batch job which reads from kafka and writes to S3. The current strategy of this job is to read From: timestamp To: timestamp. So I basically have my Kafka consumer as follows: KafkaSource.builder() …
Vinod Mohanan
  • 3,729
  • 2
  • 17
  • 25