Questions tagged [flink-batch]
158 questions
1
vote
0 answers
Please suggest hardware configuration for network-intensive Flink job (Async I/O)
TLDR; I am running Flink Streaming job in mode=Batch on EMR. I have tried several EMR cluster configurations but neither of them works as required. Some do not work at all. Workflow is very network-intensive that cases main problems.
Question: What…

Valeria Vasylieva
- 21
- 4
1
vote
0 answers
Is the Cartesian product calculation of multiple resource lists suitable for flink?
I have a Cartesian product of a list of resources A and a list of resources B and each combination will calculate the score and finally sort this scenario Is flink applicable?
My intention is to have these computations distributed across machines…

刘晓洋
- 11
- 1
1
vote
1 answer
How write to s3 table sink in flink without update and delete changes error?
Consider a code:
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
class Scratch {
public static void main(String[] args) {
StreamTableEnvironment tableEnv = /*some init code…

Cherry
- 31,309
- 66
- 224
- 364
1
vote
0 answers
Flink batch job fails with bigger files
I am trying to run a batch Apache Beam job (through the TensorFlow Extended - TFX library). This is a batch job, which should just read some CSV files from S3, convert them to TFRecords format (write back to s3) and gather stats about the…

Gorjan Todorovski
- 61
- 4
1
vote
2 answers
Move already process file from one folder to another folder in flink
I am a new bee to flink and facing some challenges to solve the below use case
Use Case description:
I will receive a csv file with a timestamp on every single day in some folder say input. The file format would be…

MiniSu
- 566
- 1
- 6
- 22
1
vote
2 answers
Apache Flink FileSink in BATCH execution mode: in-progress files are not transitioned to finished state
What we are trying to do: we are evaluating Flink to perform batch processing using DataStream API in BATCH mode.
Minimal application to reproduce the issue:
public class FlinkS3ProcessingDemoApplication {
public static void main(String[] args)…

artvolk
- 9,448
- 11
- 56
- 85
1
vote
1 answer
Apache Flink - Mount Volume to Job Pod
I am using the WordCountProg from the tutorial on https://www.tutorialspoint.com/apache_flink/apache_flink_creating_application.htm . The code is as follows:
WordCountProg.java
package main.java.spendreport;
import…

p192
- 518
- 1
- 6
- 19
1
vote
1 answer
flink disk usage in job manager increases after every job submission over rest
I have deployed my own flink setup in AWS ECS. One Service for JobManager and one Service for task Managers. I am running one ECS task for job manager and 3 ecs tasks for TASK managers.
I have a kind of batch job which I upload using flink rest…

scoder
- 2,451
- 4
- 30
- 70
1
vote
0 answers
how can I filter parquet files with common field (but different schemas ) using Flink
I have a folder with parquet files with different schemas, all have a common field that is guarantied to exists. I want to filter the lines according to that field and to write it back to other parquet file.
Similar action in spark will be fairly…

igx
- 4,101
- 11
- 43
- 88
1
vote
0 answers
Flink short jobs do not export prometheus job_name field
[DESCRIPTION]
I am running Flink 1.11.1 on Kubernetes, and set up monitoring stack using Prometheus and Grafana.
I have observed running WordCount example on Flink Cluster (submitted via UI) does not return $(job_name) on prometheus.
To…

Çağrı
- 11
- 2
1
vote
1 answer
Apache Flink updating sql dynamically without restarting
I have query regarding behaviour of Flink. Below is my code snippet. As you can see, some service is supplying list of sql criterias(say about 10k sqls) that Flink is going to execute one by one.
My issue is, whenever sql gets updated, how do I…

ParagM
- 63
- 1
- 7
1
vote
1 answer
Understanding data transferring between Operators in Flink (Batch)
Im still struggeling about how flink "exchanges/transffers" data between different operators and what happens with the actual data between the operators.
Take the example DAG above:
DAG of execution
The DataSet gets forwarded/transferred to all…

tooobsias
- 39
- 3
1
vote
0 answers
Flink Elastic Search Source Connector
I am new to Flink and Elastic Search integration. I have a scenario where i have to load history data(approx 1TB) from an old elastic search cluster(5.6) to new cluster(6.8). I have to do some data filtering and modification during the migration.…

Abhi
- 69
- 6
1
vote
0 answers
Why Apache Flink SQL validator is giving NPE for this CEP SQL?
Here is my Flink CEP MATCH_RECOGNIZE sql.
SELECT E.*
FROM MyEvents
MATCH_RECOGNIZE (
ORDER BY procTime
MEASURES
A.id as id,
A.name as name
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A)
DEFINE
A AS source='XYZ' and name IN ('EVENT_SRC1',…

ParagM
- 63
- 1
- 7
1
vote
1 answer
Hash Join and Sort merger exception in Apache Flink
Cluster Infra:
We have Flink standalone cluster with 4 nodes each with 16 cores of CPU and 32Gb of Physical memory out of which 16 GB is set to Flink Managed memory and rest all is set to UDFs and Java Heap.
Hence Per slot, we have assigned 1 core…

Murtaza Zaveri
- 49
- 7