Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
1
vote
0 answers

Apache Flink 1.11 Streaming Sink to S3

I'm using the Flink FileSystem SQL Connector to read events from Kafka and write to S3(Using MinIo). Here is my code, exec_env = StreamExecutionEnvironment.get_execution_environment() exec_env.set_parallelism(1) # start a checkpoint every 10…
Vidura Mudalige
  • 810
  • 2
  • 18
  • 31
1
vote
2 answers

Required context properties mismatch in connecting the flink with mysql database

I am using flink latest (1.11.2) to work with a sample mysql database, which the database is working fine. Additionally, i have added the flink-connector-jdbc_2.11-1.11.2, mysql-connector-java-8.0.21.jar, postgresql-42.2.17.jar to the…
user824624
  • 7,077
  • 27
  • 106
  • 183
1
vote
1 answer

Py4JJavaError in pyflink Table api

This code converts pandas to flink table do the transformation than again converting back to pandas. It perfectly works fine when I use filter filter than select but gives me an error when i add group_by and order_by. import pandas as pd import…
Ajay Chinni
  • 780
  • 1
  • 6
  • 24
1
vote
2 answers

Flink run python file failed with error "Python versions prior to 3.5 are not supported for PyFlink"

Added in May 1st: I saw an issue about this error on the jira of apache-flink ,maybe it helps? My system is CentOS7, python version 3.6.8, pyflink version 1.10.0; I'm following this tutorial and trying to run a pyflink file; but I constantly get…
Holly Wang
  • 11
  • 3
1
vote
1 answer

Using PyFlink with LightGBM

Is it possible to use PyFlink with python machine learning libraries such as LightGBM for a streaming application? Is there any good example for this?
1
vote
1 answer

flink with python, execution of job failed

For a first try I want to read JSON data from a file and pass it on to Flink. I defined a source (which reads JSON strings line by line) and a placeholder filter. See Code: from org.apache.flink.streaming.api.functions.source import…
mudvayne
  • 57
  • 8
1
vote
3 answers

Is it possible to use pyflink on windows?

Has anyone ever had success running using python and windows with flink? I'm trying the following command: .\bin\pyflink.bat examples\python\WordCount.py and getting the following error Starting execution of program Usage:…
pokegoer
  • 11
  • 3
0
votes
1 answer

PyFlink job encountering "No module named 'google'" error when using FlinkKafkaConsumer

I'm working on a PyFlink job that reads data from a Kafka topic using the FlinkKafkaConsumer connector. However, I'm encountering a persistent issue related to the google module when trying to run the job on my Flink cluster. The job works fine…
0
votes
0 answers

Unable to consume data using the latest Pyflink Kafka connector

I am trying to read the data from the Kafka topic. Kafka is set up fine. Now, when I wrote the code using PyFlink and no matter if I add the jars or not, the error remains the same. from pyflink.datastream.connectors.kafka import KafkaSource,…
RushHour
  • 494
  • 6
  • 25
0
votes
0 answers

How do I write Flink TIMESTAMP_LTZ to Elasticsearch using ISO-8601?

I have an Elasticsearch SQL sink in a Flink (PyFlink) job, where the sink table looks like: CREATE TABLE mysink ( foo TIMESTAMP_LTZ(3) ) WITH ( ... my elasticsearch connection details ) In my Elasticsearch index, field foo has type date. When…
John
  • 10,837
  • 17
  • 78
  • 141
0
votes
0 answers

Pyflink Python app deployed to 2 pods produces duplicates in the output topic

I have a Pyflink app as pure python app - executing as "python -m flink_app.py" Assuming that I have simple datastream app, consuming from input kafka topic and producing to output kafka topic. Due to the scale, I need to deploy this app on 2…
0
votes
1 answer

flink table api cross table reference

We have two applications app 1 creates two tables ie sourceTable and targetTable1. It does select * from sourceTable to targetTable1. This are both created on kafka topics App2 creates a sourceTable2 on kafka topic of targetTable1 created above. It…
0
votes
1 answer

How to load initial reference data within Flink job when using broadcast state pattern?

I have some slow-changing reference data that I want to have available when processing events in Flink using PyFlink. For example, imagine there is information about employee IDs, teams and departments and how they relate to one another. The…
John
  • 10,837
  • 17
  • 78
  • 141
0
votes
1 answer

PyFlink-NiFi Datastream Integration

I need to process data coming from NiFi using PyFlink for a project, but the PyFlink documentation does not mention a NiFi connector, and we don't want to use Kafka in between. Is there an alternative way to achieve this? I have successfully…
0
votes
0 answers

Deploying pyflink on kubernetes with connectors (kafka/kinesis)

I am trying to find a way to deploy pyflink on k8s using the k8s operator. I have been able to upload a job already with the k8s Operator, but I can't find how to add connectors to it (Like kafka-connector.jar o kinesis-connector.jar). I couldn't…