Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
0
votes
1 answer

Are there any Kinesis Connectors for Python DataStream API Flink 1.13 version?

I am trying to build a streaming application using Kinesis Data Analytics with Flink 1.13 version in Python. The source for the application is the Kinesis data stream. but I can see that the kinesis connector FlinkKinesisConsumer is not available in…
0
votes
1 answer

Data Loading using Flink efficiency?

I am working on getting various approaches to load data at once from One Database to Another. It may / may not be NoSQL or SQL or RDBMS. I am thinking about how efficiently can Flink execute sink and source. Can Flink do the data loading less than…
0
votes
1 answer

how to integrate machine learning models in flink to make predictions using python

I am trying to make real time predictions on a dataset, how can i include svm, nb, and cnn on apache flink to make predictions/classifications in python.
0
votes
1 answer

Flink DDL can not parse 2 different json roots from Kafka Topic

I am sending message from Kafka to Flink in Python. I have 2 different json roots in one Kafka topic. My json roots with examples: 1- {'Message1': {'b': 'c'}} 2- {'Message2': {'e': 'f'}} Flink can consume these messages but can not parse for DDL…
0
votes
1 answer

Apache Flink Table API Insert statement

I have a Flink application that processes data from 2 streams. I am using a Table API where i want to consume data from one stream1 and query another stream2 and get the record with the latest timestamp - I have this now - def…
Dan
  • 79
  • 10
0
votes
1 answer

Flink - Can't convert Table to DataStream

I've managed to use the Pyflink table API to connect to Kinesis and process a stream of data. I'm now trying to convert this table to a DataStream as I need more low level processing. I've tried following the example here…
0
votes
1 answer

Is it possible to use DataStream API followed by Table API SQL in PyFlink?

In PyFlink, is it possible to use the DataStream API to create a DataStream by means of StreamExecutionEnvironment's addSource(...), then perform transformations on this data stream using the DataStream API and then convert that stream into a form…
John
  • 10,837
  • 17
  • 78
  • 141
0
votes
1 answer

What does the op column mean in pyflink when print table result

When I do a join query using pyflink sql and print the result, there are some duplicate rows where a op column is displayed as in attached screenshot, any idea what that is and how can I produce non-duplicate result? Thanks in advance. screenshot
chris
  • 11
  • 2
0
votes
0 answers

How to set Flink TaskManager Total Flink Memory?

I'm wanting to run quite a number of PyFlink jobs on Kubernetes, where the amount of state and number of events being processed is small and therefore I'd like to use as little memory in my clusters as possible so I can bin pack most efficiently.…
John
  • 10,837
  • 17
  • 78
  • 141
0
votes
1 answer

Querying nested row with Python in Flink

Based upon the pyflink walkthrough, I'm trying to now get a simple nested row query working using apache-flink==1.14.4. I've created my table structure based upon this solution: Get nested fields from Kafka message using Apache Flink SQL A message…
duffn
  • 3,690
  • 8
  • 33
  • 68
0
votes
1 answer

Format logs for PyFlink application with different pattern for the Python part

I have a Python application running in PyFlink. Is there a way of saying "Apply this pattern to the logs produced by the python code, and this other pattern to every other log?" I'd like to format the log messages that are coming from my Python code…
Savir
  • 17,568
  • 15
  • 82
  • 136
0
votes
1 answer

Can't connect Pyflink source to AWS Kinesis

I'm using Pyflink and trying to use AWS Kinesis as a source for the Table API using the following instructions: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/table/python_table_api_connectors/ Using the connectors…
0
votes
1 answer

udf map function returns a table with unnamed schema

code: func = udf(log_parser, result_type=DataTypes.ROW( [DataTypes.FIELD("ts", DataTypes.TIMESTAMP(precision=3)), DataTypes.FIELD("clientip", DataTypes.STRING()), …
Shayxu
  • 5
  • 2
0
votes
2 answers

Why is there no RichMapFunction in pyflink?

There is pyflink.datastream.MapFunction in Flink Python API Docs. Meanwhile there is no RichMapFunction. Could somebody tell me why?
Shayxu
  • 5
  • 2
0
votes
1 answer

Internal architecture of pyflink

I referenced the article shown below which instruct how pyflink works on python interpreter and jvm. https://www.alibabacloud.com/blog/the-flink-ecosystem-a-quick-start-to-pyflink_596150 And I couldn't figure out that whether they execute a job…