Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
1
vote
0 answers

PyFlink: What's the best way to provide reference data and keep them uptated

I have a use case where I want to do comparisons between incoming data and some reference data provided by another service. What's the best way in pyflink to fetch those data and update them regularly (in intervals of 1-2 hours) Other…
Amir Afianian
  • 2,679
  • 4
  • 22
  • 46
1
vote
0 answers

Failed to deserialize Avro record : Getting ArrayIndexOutOfBoundsException

I am trying to read from Kafka with Avro format using Pyflink My Program is this : from pyflink.datastream import StreamExecutionEnvironment from pyflink.datastream.connectors.kafka import FlinkKafkaConsumer from pyflink.datastream.formats.avro…
mjennet
  • 75
  • 1
  • 10
1
vote
1 answer

How to use a field in ROW type column in Flink SQL?

I'm executing a SQL in Flink looks like this: create table team_config_source ( `payload` ROW( `before` ROW( team_config_id int, ... ), `after` ROW( team_config_id int, ... …
Rinze
  • 706
  • 1
  • 5
  • 21
1
vote
0 answers

Not able to run simple pyflink word_count.py on aws emr

I have created an EMR cluster (v5.35.0) and am trying to run a sample word_count.py to verify if I am able to execute a flink job. I am able to use python3 as mentioned in this question How do you run pyflink scripts on AWS EMR? Using the below…
1
vote
1 answer

How to read data from HDFS with Flink in python

I want to read data from HDFS with Flink in python I found it possible with Java or Scala : https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/dataset/formats/hadoop/ Indeed, Flink HDFS connector provides a Sink that writes…
Zak_Stack
  • 103
  • 8
1
vote
1 answer

PyFlink - RabbitMQ sink : A serializer has already been registered for the state; re-registration is not allowed

In PyFlink coding with Python, I am using Flink 1.15.2 and I source messages from RabbitMQ with the following connector: flink-sql-connector-rabbitmq-1.15.2.jar However, when I try to sink to RabbitMQ with this code, following this link:…
Ali Ait-Bachir
  • 550
  • 4
  • 9
1
vote
0 answers

Using Python Functions in a Java Flink Job - 1.15

Is there any way to use a python function (Aggregate, Map etc.) within a Java Flink Job? I do not want to exploit SQL API. I wonder if only DataStream API can handle such functionality? Without this syntax: tableEnv.executeSql("CREATE TEMPORARY…
1
vote
0 answers

pyFlink submit job with multiple external connector jars on Amazon Kinesis

I am follwing this guide to create an Amazon Kinesis Analytics Application with pyflink, and my application requires more than 1 external connector jarfile. When it comes to the jarfile uploading section, it seems I can only upload 1 jarfile, how…
chris
  • 11
  • 2
1
vote
1 answer

Flink Python Datastream API Kafka Consumer - NoClassDefFoundError ByteArrayDeserializer Error

I have an error on Py4j side of the PyFlink. Code is below: env = StreamExecutionEnvironment.get_execution_environment() env.add_jars("file:/" + os.getcwd() + "/jar_files/" + "flink-sql-connector-kafka-1.15.0.jar") type_info = Types.ROW_NAMED(['id',…
1
vote
0 answers

How to convert Table containing TIMESTAMP_LTZ into DataStream in PyFlink 1.15.0?

I have a source table using a Kinesis connector reading events from AWS EventBridge using PyFlink 1.15.0. An example of the sorts of data that are in this stream is here. Note that the stream of data contains many different types of events, where…
John
  • 10,837
  • 17
  • 78
  • 141
1
vote
0 answers

How to install Python 3.7 or 3.8 into OpenJDK docker container without building from source?

I want to use the official Flink image on Docker Hub (https://hub.docker.com/_/flink) for use with PyFlink (specifically the 1.14.4 version). That image is based on the openjdk:11-jre base image which is Debian 11.3. Debian 11 does not seem to have…
John
  • 10,837
  • 17
  • 78
  • 141
1
vote
1 answer

How to sink message to InfluxDB using PyFlink?

I am trying to run PyFlink walkthough, but instead of sinking data to Elasticsearch, i want to use InfluxDB. Note: the code in walkthough (link above) is working as expected. In order for this to work, we need to put InfluxDB connector inside…
Ermolai
  • 303
  • 4
  • 15
1
vote
0 answers

pyflink 1.15.0 alternative for OutputTag?

I am using ProcessFunction in pyflink (1.15.0) job. One of the use case is to filter out wrong input to different kafka topic. In java, we use OutputTag to redirect those inputs to another stream and then to different sink. In pyflink 1.15.0 I see…
Lakshya Garg
  • 736
  • 2
  • 8
  • 23
1
vote
1 answer

i'm getting this error when running the below pyflink code

this is the code for calculating average of each ch[x] from a kafka source using apache flink(pyflink) i think i have imported all of the necessary libraries And I'm getting this error when running the code from numpy import average from…
1
vote
2 answers

pyflink with kafka java.lang.RuntimeException: Failed to create stage bundle factory

・Python3.8 ・JDK 11 I've started learning pyflink and write a code instructed by official web which is https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/intro_to_datastream_api/ And here is my code from…