Questions tagged [flink-batch]

158 questions
0
votes
1 answer

Flink forward files from List filePaths

We have a list of filepaths from a DB table with a timestamp on when it is created. Trying to figureout how we can use the filepath list from db to forward only those files from nfs to kafka sink. Right now I am using customized version of…
VSK
  • 359
  • 2
  • 5
  • 20
0
votes
1 answer

FLINK- how to process logic on sql query result

My requirement is to process or build some logic around the result of sql query in flink. For simplicity lets say I have two sql query they are running on different window size and one event stream. My question is a) how I will know for which query…
0
votes
1 answer

How does TM recovery handle past broadcasted data

In the context of HA of TaskManagers(TM), when a TM goes down a new one will be restored from latest checkpoint of faulted by the JobManager(JM). Say we have 3 TMs (tm1, tm2, & tm3) At a give time t where everyone's checkpoint(cp) is at cp1. All TMs…
0
votes
1 answer

FLINK- Load historical data and maintain window of 30 days

My requirement is to hold 30 days data into stream to given any day for processing. so first day when FLINK application will start, it will fetch 30 days data from database and will merge to current stream data. My challenge is - manage 30 days data…
0
votes
1 answer

Not able to sleep in custom source funtion in Apache Flink which is union with other source

I have two sources, one is Kafka source and one is the custom source, I need to make a sleep custom source for one hour but I am getting below interruption. java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native…
0
votes
1 answer

Flink combination of windowByTime and triggerByCount

source.keyBy(0) .window(TumblingEventTimeWindows.of(Time.seconds(5))) .trigger(PurgingTrigger.of(CountTrigger.of[TimeWindow](2))) .process(new TestFun()) Explanation: Let's say I have 3 events[E1, E2, E3], which should be trigger by…
0
votes
1 answer

Flink requires local path for hive conf directory but how to give that path if we are submitting flink job on yarn?

https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/hive/#connecting-to-hive According to this link, Flink requires local hive conf folder path but I need to submit the Flink job at yarn so Flink try to find path in yarn container e.g.…
0
votes
1 answer

How to read an Excel file in Apache Flink?

Could someone explain how to load Excel data into Apache Flink?. I have seen in the API Doc other kind of formats such as txt, csv but not Excel. Thanks in advance.
toni_92
  • 1
  • 2
0
votes
1 answer

Flink hadoop implementation problem - Could not find a file system implementation for scheme 'hdfs'

I'm struggling with integration hdfs to flink. Scala binary version: 2.12, Flink (cluster) version: 1.10.1 here is HADOOP_CONF_DIR; and configuration of hdfs is here; This configuration and HADOOP_CONF_DIR also the same in the taskmanager as…
0
votes
1 answer

filter by max in tuple field in Apache Flink

I'm using the Apache Flink Streaming API through to process a data file and I'm interested in getting only the results from the last of the windows. Is there a way to do this? If it is not possible, I thought I could filter through the maximum of…
ekth0r
  • 65
  • 5
0
votes
1 answer

How to add new rows to an Apache Flink Table

Is it possible to add a new record/row to a flink table? For example i have the following table configuration: ExecutionEnvironment env = TableEnvironmentLoader.getExecutionEnvironment(); BatchTableEnvironment tableEnv =…
0
votes
2 answers

Does the flink 1.7.2 dataset not support kafka sink?

Does the flink 1.7.2 dataset not support kafka sink ? After doing the batch operation I need to publish the message to kafka, meaning source is my postgres and sink is my kafka. Is it possible ?
MadProgrammer
  • 513
  • 5
  • 18
0
votes
1 answer

Flink Java API - Pojo Type to Tuple Datatype

I am creating a small utility on JAVA flink API to learn the functionalities. I am trying to read csv file and just print it and I have developed a POJO class for the structure of the data. When I executed the code, I dont see the right…
Karthi
  • 708
  • 1
  • 19
  • 38
0
votes
1 answer

why is it bad to execute Flink job with parallelism = 1?

I'm trying to understand what are the important features I need to take into consideration before submitting a Flink job. My question is what is the number of parallelism, is there an upper bound(physically)? and how can the parallelism impact the…
0
votes
2 answers

Performance difference when doing a serialization between type of object vs statically typed

Do we need to statically type/declare the data type of variable when is going to be serialized? Does it improve any performance while serializing? I'm creating a flink project for batch processing. I wrote a custom input reader, which is going to…
1 2 3
10
11