Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
0
votes
1 answer

kafkaproducer is all right but kafkaconsumer failed when using kafka connector

I'm using pyflink-1.17.0, java11, flink1.17.0 on linux. I run a local cluster and try to run the example code below. import logging import sys from pyflink.common import Types from pyflink.datastream import StreamExecutionEnvironment from…
yangwenyu
  • 1
  • 1
0
votes
1 answer

Flink streaming Kinesis to Hudi not writing any data

I'm trying out PyFlink for streaming data from Kinesis into Hudi format, but can't figure out why it is not writing any data. I hope that maybe someone can provide any pointers. Versions: Flink 1.15.4, Python 3.7, Hudi 0.13.0 I use streaming table…
Timo
  • 5,188
  • 6
  • 35
  • 38
0
votes
0 answers

PyFlink KafkaSink throws AttributeError: 'NoneType' object has no attribute 'startswith'

I am trying to read a kafka topic and write the same in another kafka topic using KafkaSource/KafkaSink in pyflink (flink version 1.16). Reading from kafka topic works and I am able to print the result but when trying to send to kafka using…
Monika X
  • 322
  • 4
  • 13
0
votes
1 answer

Pyflink->Elastic converts Varchar to Long?

I started working with Pyflink last week and found myself in a roadblock situation. Basically I try to Import Data from Source A and sink it to Elastic, which works great, but there is one special Field that's not working properly. The field is a 10…
0
votes
1 answer

How to make fat jar in AWS KDA Flink Application?

I need the following jar dependencies for my pyflink application. flink-s3-fs-hadoop-1.15.2.jar flink-sql-parquet-1.15.2.jar flink-s3-fs-presto-1.15.2.jar I want to package and deploy it to AWS Kinesis Data Analytics. AWS KDA needs one single fat…
piby180
  • 388
  • 1
  • 6
  • 18
0
votes
0 answers

Configuring Python Virtual Environment

I am trying to configure Python Virtual Environment by following the steps here and using the script from here on my Fedora laptop and later using it in PyCharm. When execute the script is returning the following error: Collecting…
Monika X
  • 322
  • 4
  • 13
0
votes
1 answer

Writing rdbms data to s3 bucket using flink or pyflink

If this kind of error occur while writing data to s3 bucket using flink and pyflink: ERROR] Could not execute SQL statement. Reason: org.apache.flink.util.SerializedThrowable: The AWS Access Key Id you provided does not exist in our records.…
0
votes
0 answers

writing postgres table records to s3 using flink

If this kind of error occur while executing insert statement on flink( like trying to ingest rdbms data to s3 Inmy case I was trying to write from postgres to s3 buckte using flink) [ERROR] Could not execute SQL statement.…
0
votes
0 answers

Preparing Python Virtual Environment for Flink

I cannot find the setup-pyflink-virtual-env.sh mentioned here Preparing Python Virtual Environment. The link does not work in the article. The lastest version found is for flink 1.12 here. Is there a newer version and where can I find it?
Monika X
  • 322
  • 4
  • 13
0
votes
1 answer

PyFlink module java.base does not "opens java.lang" to unnamed module

I want to run simple example from Flink documentation. And after start i got exception: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module…
padavan
  • 714
  • 8
  • 22
0
votes
1 answer

pyflink 1.16.1 - access issue to secured Kafka cluster

I'm trying to produce to secured Kafka cluster using PyFlink. I tried to use the default JSON producer example provided by Flink project. My configurations look like : USERNAME = 'username' PASSWORD = 'password' def write_to_kafka(env): …
Matar
  • 73
  • 1
  • 7
0
votes
1 answer

Flink Processing Time Characteristic Context Returning timestamp

Summary When configuring flink to use processing time I would expect the context.timestamp() to return null in a keyed processing function. When doing testing it seems to return the ingestion timestamp from the source kafka topic. Details I am using…
Sholto
  • 1
0
votes
0 answers

PyFlink Table API: Tumble window on joined debezium streams not producing output

I have two streams that have kafka as source and format debezium-json. The schema was excluded from the message. This is the definition of both streams: CREATE TABLE transactions ( `account_id` BIGINT, `id` BIGINT PRIMARY KEY NOT…
Jorge Cespedes
  • 547
  • 1
  • 11
  • 21
0
votes
0 answers

Flink-ML shows "Failed to fetch next result"

I am totally new to flink and when i was trying the flink-ML by following docs. So, when I entered $FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar after looking in the…
0
votes
1 answer

The configuration does not specify the checkpoint directory 'state.checkpoints.dir'

While submitting the flink job on the dataproc cluster getting the below error. Please find the code base and the error. I am using the flink 1.9.3 version. The program finished with the following…