Highest Voted 'pyflink' Questions

1

vote

1 answer

PyFlink performance compared to Scala

How PyFlink performance is compared to Flink + Scala? Big Picture. The goal is to build Lambda architecture with Cold and Hot Tier. Cold (Batch) Tier will be implemented with Apache Spark (PySpark). But with Hot (Streaming) Tier there are different…

asked Nov 05 '21 at 14:23

Takito Isumoro

174
1
11

1

vote

1 answer

Flink watermarks not advancing in Python, stuck at -9223372036854775808

I have encountered this issue with several pipelines and haven't been able to find an answer. When running a pipeline with a watermark strategy assigned for either monotonous or out of bounds timestamps with a timestamp assigner, the timestamp is…

apache-flink flink-streaming pyflink

asked Oct 13 '21 at 07:25

kman

95
2

1

vote

1 answer

PyFlink unix epoch timestamp conversion issue

I have events coming in with unix epoch timestamp, I am using a table with Kinesis connector for source table. I need to use the same timestamp field as the watermark. How do I do this in python? I am using Flink-1.11 release as thats the latest…

apache-flink pyflink amazon-kinesis

asked Oct 01 '21 at 16:25

ARU

137
1
9

1

vote

1 answer

PyFlink Error/Exception: "Hive Table doesn't support consuming update changes which is produced by node PythonGroupAggregate"

Using Flink 1.13.1 and a pyFlink and a user-defined table aggregate function (UDTAGG) with Hive tables as source and sinks, I've been encountering an error: pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException: Table…

exception apache-flink flink-streaming flink-sql pyflink

asked Aug 05 '21 at 01:51

hadrianpaulo

81
1
6

1

vote

1 answer

Flink Source kafka Join with CDC source to kafka sink

We are trying to join from a DB-cdc connector (upsert behave) table. With a 'kafka' source of events to enrich this events by key with the existing cdc data. kafka-source (id, B, C) + cdc (id, D, E, F) = result(id, B, C, D, E, F) into a kafka sink…

apache-flink flink-streaming flink-sql pyflink

asked Jul 14 '21 at 10:35

Carlos Bernal

11
1

1

vote

0 answers

How to implement dynamic rules functionality in PyFlink?

My aim is to implement dynamic rule based validation of a streaming dataset. My project is using Pyflink. I know that there is a Broadcast pattern in Flink, but didnt find any credible info with regards to the same in Python. Is this feature…

apache-flink flink-streaming pyflink

asked Jul 06 '21 at 11:33

ASHISH M.G

522
2
7
23

1

vote

1 answer

PyFlink UDAF InternalRow vs. Row

I'm trying to call an outer function through custom UDAF in PyFlink. The function I use requires the data to be in a dictionary object. I tried to use row(t.rowtime, t.b, t.c).cast(schema) to achieve such effect. Outside the UDAF, this expression…

python apache-flink pyflink

asked Jun 17 '21 at 21:41

tmrlvi

2,235
17
35

1

vote

2 answers

Pyflink Table API Streaming Group Window

I am trying to do some aggregation over a window in PyFlink. However I get A group window expects a time attribute for grouping in a stream environment. error for trying it. I have a time attribute both in the window definition and in the…

python apache-flink pyflink

asked Jun 17 '21 at 21:31

tmrlvi

2,235
17
35

1

vote

2 answers

What's wrong with my Pyflink setup that Python UDFs throw py4j exceptions?

I'm playing with the flink python datastream tutorial from the documentation: https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/python/datastream_tutorial/ Environment My environment is on Windows 10. java -version gives: openjdk…

apache-flink pyflink

asked Jun 17 '21 at 08:33

Chr1s

258
3
14

1

vote

2 answers

pyflink kafka connector deserializes received json data to null

I am creating a stream processor using PyFlink. When I connect Kafka to Flink, everything works fine. But when I send json data to kafka, PyFlink receives it but the deserialiser converts it to null. PyFlink code is from pyflink.common.serialization…

json apache-kafka apache-flink stream-processing pyflink

asked Jun 12 '21 at 19:24

Sam Prabin

11
1
1

1

vote

1 answer

PyFlink - How can I push data to mongodb and redis by using PyFlink?

I'm new to PyFlink. Recently, I use PyFlink to complete a feature that read stream data from Kafka and insert it to another Kafka. Now, I want to push data into mongodb and redis. But I read the documents and search this question on search engine…

python mongodb redis apache-flink pyflink

asked Jun 01 '21 at 14:55

Zhengfei Xin

35
3

1

vote

0 answers

Does Flink Python API support gauge metric?

I'm using PyFlink for streaming processing, and have added some metrics to monitor the performance. Here's my code for registering the udf with metrics. I've installed apache-flink 1.13.0. class Test(ScalarFunction): def __init__(self): …

apache-flink pyflink

asked May 22 '21 at 08:03

8186lz

11
2

1

vote

1 answer

PyFlink datastream API support for windowing

Does Apache Flink's Python SDK (PyFlink) Datastream API support operators like Windowing? Whatever examples I have seen so far for Windowing with PyFlink, all use the Table API. The Datastream API does support these operators, but looks like these…

apache-flink pyflink

asked Mar 21 '21 at 09:56

sumeetkm

189
1
7

1

vote

1 answer

PyFlink java.io.EOFException at java.io.DataInputStream.readFully

I have a PyFlink job that reads from a file, filter based on a condition, and print. This is a tree view of my working directory. This is the PyFlink script main.py: from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import…

apache-flink pyflink

asked Mar 18 '21 at 08:59

yiksanchan

1,890
1
13
37

1

vote

1 answer

Why does Flink FileSystem sink splits into multiple files

I want to use Flink to read from an input file, do some aggregation, and write the result to an output file. The job is in batch mode. See wordcount.py below: from pyflink.table import EnvironmentSettings, BatchTableEnvironment #…

apache-flink pyflink

asked Mar 15 '21 at 08:45

yiksanchan

1,890
1
13
37

Questions tagged [pyflink]