Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
0
votes
0 answers

Flink SQL using python udf error: org.apache.beam.sdk.options.PipelineOptionsFactory

# start flink sql client bin/sql-client.sh \ -pyarch file:///opt/flink_data/requirements/py_env/pyflink_jm_1.16.0_env.zip \ -pyexec pyflink_jm_1.16.0_env.zip/bin/python3.7 \ -pyclientexec pyflink_jm_1.16.0_env.zip/bin/python3.7 \ -pyfs…
KarmA
  • 1
  • 1
0
votes
1 answer

PyFlink on Kinesis Analytics Studio - Cannot convert DataStream to Amazon Kinesis Data Stream

I have a DataStream coming from a CoFlatMapFunction (simplified here): %flink.pyflink # join two streams and update the rule-set class MyCoFlatMapFunction(CoFlatMapFunction): def open(self,…
0
votes
1 answer

Flink failed to deserialize JSON produced by Debezium

I'm trying to use Flink to consume the change event log produced by Debezium. The JSON was this: { "schema":{ }, "payload":{ "before":null, "after":{ "team_config_id":3800, …
Rinze
  • 706
  • 1
  • 5
  • 21
0
votes
1 answer

pyflink Unsupported Python SqlFunction CAST when working with amazon-kinesis-sql-connector and udtf function

i am currently trying to get Pyflink running with the AWS-Kinesis-SQL-Connector. A use the TableAPI and can read from Kinesis and also write back to another Kinesis Stream. As soon as i use a udtf decorated function i get the following exception: …
Michael Boesl
  • 236
  • 1
  • 9
0
votes
0 answers

Flink could not connect to Jobmanager in Docker

I've started a Flink Jobmanager in docker using docker run --rm --name=jobmanager --network flink-network --publish 8081:8081 --env FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager" apache/flink:1.16.0-java11 jobmanager, and I could visit Flink…
Rinze
  • 706
  • 1
  • 5
  • 21
0
votes
1 answer

Pyflink DataStream API get Rowtime(Kafka Message Timestamp)

Is it possible to get the rowtime of a kafka message within the DataStream API of Flink/Pyflink? i'm subscribing with pyflink to a kafka topic and need to access the metadata(rowtime) of the message i got: types = Types.ROW_NAMED(['name',…
Hansanho
  • 295
  • 1
  • 3
  • 13
0
votes
1 answer

merge() method on AggregateFunction in Flink

I want to know when the merge() method on AggregateFunction gets called. From what I've understood from the answers here and here, is that its applicable to Session Windows only and occurs on every event that can be merged with the previous window…
sunny
  • 27
  • 5
0
votes
1 answer

Flink custom metrics are not shown in Datadog

In Flink, I am generating custom metrics in a FlatMapFunction using Python. class OccupancyEventFlatMap(FlatMapFunction): def open(self, runtime_context: RuntimeContext): mg = runtime_context.get_metrics_group() self.counter_sum…
0
votes
0 answers

Flink PyFlink throws incompatible type for named row

With the following PyFlink code: ds2 = ds.map(my_map_func, output_type=Types.ROW_NAMED( ['x', 'y'], [Types.STRING(), Types.INT()] )) t_env.execute_sql(""" CREATE TABLE my_sink ( y INT, x STRING ) WITH ( 'connector' =…
Matty F
  • 3,763
  • 4
  • 30
  • 48
0
votes
1 answer

Can a source ignore unknown fields in Apache Flink?

Suppose I have a Kafka topic that will be pushed with events by many services, and I want to use Flink to handle these events. In addition, those events are heterogeneous but have several fields that are the same. For example, there are three common…
0
votes
2 answers

org.apache.flink.table.api.ValidationException: Unable to create a sink for writing table 'default_catalog.default_database.hTable'

I am trying to connect Flink 1.14.4 with HBase version 2.2.14; I am added Hbase SQL connector jar flink-sql-connector-hbase-2.2-1.15.2.jar , but for version 2.2.x becauce it is the last version of jar. but I got the following…
Zak_Stack
  • 103
  • 8
0
votes
0 answers

How can we fetch Flink elected Job Manager IP programmatically?

We are using High Availability mode of Flink where Job Manager IP can change at any time. We are using Flink Job Manager's IP to access Flink Rest endpoint. I wanted to know if there is any programmatic way of knowing the updated Job manager's IP?
priyadhingra19
  • 333
  • 4
  • 15
0
votes
0 answers

Flink read binary ION records from Kinesis

I had a Kinesis stream containing binary ION records, and I needed to read that stream in Flink. My solution, 2 years ago, was to write a Base64 SerDe (just about 20 lines of Java code) and use that for the KinesisConsumer. Now I have the same…
Averell
  • 793
  • 2
  • 10
  • 21
0
votes
1 answer

Query on automating Flink Job submission

I am trying to use Flink REST APIs to automate Flink job submission process via pipeline. To call any Flink Rest endpoint we should be aware about the Job Manager Web interface IP. For my POC, I got the IP after running flink-yarn-session command on…
priyadhingra19
  • 333
  • 4
  • 15
0
votes
1 answer

exception when running pyFlink batch processing with 2 sinks

I got the following flink exception when I run pyflink processing job: Exception in thread read_grpc_client_inputs: Traceback (most recent call last): File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner self.run() File…