Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
1
vote
1 answer

PyFlink performance compared to Scala

How PyFlink performance is compared to Flink + Scala? Big Picture. The goal is to build Lambda architecture with Cold and Hot Tier. Cold (Batch) Tier will be implemented with Apache Spark (PySpark). But with Hot (Streaming) Tier there are different…
Takito Isumoro
  • 174
  • 1
  • 11
1
vote
1 answer

Flink watermarks not advancing in Python, stuck at -9223372036854775808

I have encountered this issue with several pipelines and haven't been able to find an answer. When running a pipeline with a watermark strategy assigned for either monotonous or out of bounds timestamps with a timestamp assigner, the timestamp is…
kman
  • 95
  • 2
1
vote
1 answer

PyFlink unix epoch timestamp conversion issue

I have events coming in with unix epoch timestamp, I am using a table with Kinesis connector for source table. I need to use the same timestamp field as the watermark. How do I do this in python? I am using Flink-1.11 release as thats the latest…
ARU
  • 137
  • 1
  • 9
1
vote
1 answer

PyFlink Error/Exception: "Hive Table doesn't support consuming update changes which is produced by node PythonGroupAggregate"

Using Flink 1.13.1 and a pyFlink and a user-defined table aggregate function (UDTAGG) with Hive tables as source and sinks, I've been encountering an error: pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException: Table…
1
vote
1 answer

Flink Source kafka Join with CDC source to kafka sink

We are trying to join from a DB-cdc connector (upsert behave) table. With a 'kafka' source of events to enrich this events by key with the existing cdc data. kafka-source (id, B, C) + cdc (id, D, E, F) = result(id, B, C, D, E, F) into a kafka sink…
1
vote
0 answers

How to implement dynamic rules functionality in PyFlink?

My aim is to implement dynamic rule based validation of a streaming dataset. My project is using Pyflink. I know that there is a Broadcast pattern in Flink, but didnt find any credible info with regards to the same in Python. Is this feature…
ASHISH M.G
  • 522
  • 2
  • 7
  • 23
1
vote
1 answer

PyFlink UDAF InternalRow vs. Row

I'm trying to call an outer function through custom UDAF in PyFlink. The function I use requires the data to be in a dictionary object. I tried to use row(t.rowtime, t.b, t.c).cast(schema) to achieve such effect. Outside the UDAF, this expression…
tmrlvi
  • 2,235
  • 17
  • 35
1
vote
2 answers

Pyflink Table API Streaming Group Window

I am trying to do some aggregation over a window in PyFlink. However I get A group window expects a time attribute for grouping in a stream environment. error for trying it. I have a time attribute both in the window definition and in the…
tmrlvi
  • 2,235
  • 17
  • 35
1
vote
2 answers

What's wrong with my Pyflink setup that Python UDFs throw py4j exceptions?

I'm playing with the flink python datastream tutorial from the documentation: https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/python/datastream_tutorial/ Environment My environment is on Windows 10. java -version gives: openjdk…
Chr1s
  • 258
  • 3
  • 14
1
vote
2 answers

pyflink kafka connector deserializes received json data to null

I am creating a stream processor using PyFlink. When I connect Kafka to Flink, everything works fine. But when I send json data to kafka, PyFlink receives it but the deserialiser converts it to null. PyFlink code is from pyflink.common.serialization…
1
vote
1 answer

PyFlink - How can I push data to mongodb and redis by using PyFlink?

I'm new to PyFlink. Recently, I use PyFlink to complete a feature that read stream data from Kafka and insert it to another Kafka. Now, I want to push data into mongodb and redis. But I read the documents and search this question on search engine…
1
vote
0 answers

Does Flink Python API support gauge metric?

I'm using PyFlink for streaming processing, and have added some metrics to monitor the performance. Here's my code for registering the udf with metrics. I've installed apache-flink 1.13.0. class Test(ScalarFunction): def __init__(self): …
8186lz
  • 11
  • 2
1
vote
1 answer

PyFlink datastream API support for windowing

Does Apache Flink's Python SDK (PyFlink) Datastream API support operators like Windowing? Whatever examples I have seen so far for Windowing with PyFlink, all use the Table API. The Datastream API does support these operators, but looks like these…
sumeetkm
  • 189
  • 1
  • 7
1
vote
1 answer

PyFlink java.io.EOFException at java.io.DataInputStream.readFully

I have a PyFlink job that reads from a file, filter based on a condition, and print. This is a tree view of my working directory. This is the PyFlink script main.py: from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37
1
vote
1 answer

Why does Flink FileSystem sink splits into multiple files

I want to use Flink to read from an input file, do some aggregation, and write the result to an output file. The job is in batch mode. See wordcount.py below: from pyflink.table import EnvironmentSettings, BatchTableEnvironment #…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37