Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions
0
votes
1 answer

Accessing kafka timestamps in pyflink

I'm trying to write a Pyflink application for measuring latency and throughput. My data comes as json objects from a kafka topic and is loaded into a DataStream using the SimpleStringSchema-class for deserialization. Following the answer to this…
Cipollino
  • 37
  • 1
  • 7
0
votes
1 answer

PyFlink: called already closed and NullPointerException

I run into an issue where a PyFlink job may end up with 3 very different outcomes, given very slight difference in input, and luck :( The PyFlink job is simple. It first reads from a csv file, then process the data a bit with a Python UDF that…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37
0
votes
1 answer

Adding func_type='pandas' to a PyFlink UDF throws ArrowTypeError('Did not pass numpy.dtype object'))

I have a PyFlink job that reads from a csv file (in path data.txt), sum up the first 2 integer columns, and print the result. Here's the data.txt file. > cat data.txt 1 1 1 1 2 2 2 2 Here is the file (named batch-prediction.py) that…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37
0
votes
2 answers

Python Flink Connect to Remote Flink Environment

I have flink system running in a remote system.. say with IP as 10.XX.XX.XX and port as 6123. Now I would like to connect from another system using Pyflink using RemoteExecution Environment. I saw the docs…
0
votes
1 answer

Can I use PyFlink together with PyTorch/Tensorflow/ScikitLearn/Xgboost/LightGBM?

I am exploring PyFlink and I wonder if it is possible to use PyFlink together with all these ML libs that ML engineers normally use: PyTorch, Tensorflow, Scikit Learn, Xgboost, LightGBM, etc. According to this SO thread, PySpark cannot use Scikit…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37
0
votes
1 answer

Reading a csv file in batch mode using pyflink from local system

I was trying to read an established csv file while writing a pyflink job. I was using filesystem connector to get the data but after executing execute_sql() on the ddl and later doing query on the table I was getting an error which explains that it…
0
votes
2 answers

Pyflink : 'JavaPackage' object is not callable

When I run a Python file in Flink CLI using the following code: python3 word_count.py I got the error like this: Traceback (most recent call last): File "word_count.py", line 79, in word_count() File "word_count.py", line 37, in…
YT Q
  • 31
  • 4
0
votes
1 answer

Converting Flink dynamic table into a Pandas dataframe

I'm using the pyflink table api to read data from Kafka. Now I want to convert the resultant table into a Pandas dataframe. Here is my code, exec_env = StreamExecutionEnvironment.get_execution_environment() exec_env.set_parallelism(1) t_config =…
0
votes
2 answers

Difference between flink run -py and python run

I recently working on learning pyflink, but I was a little bit confused. We know that pyflink table API convert stream/batch into table and do some work on it and finally sink to where u want. However, there are several ways to create table env: For…
0
votes
1 answer

invalid syntax in running flink 1.12.0 wordcount example?

I am working on the first flink wordcount example from https://github.com/uncleguanghui/pyflink_learn. My environment is flink 1.12.0 and ubuntu, the flink is running in the background. The wordcount example is fairly simple. import os import…
user824624
  • 7,077
  • 27
  • 106
  • 183
0
votes
1 answer

How to use table.where() to filter for subfields in PyFlink?

I'm using pyflink and Flink 11.2 and I've defined my table like this: def _create_sink_table(st_env): # Create SINK table. st_env.execute_sql(f""" CREATE TABLE {"in"} ( `a` STRING, `b` STRING, `c`…
Denis Nutiu
  • 1,178
  • 15
  • 22
0
votes
1 answer

PyFlink - Scala UDF - How to convert Scala Map in Table API?

I'm trying to map the Map[String,String] object output of my Scala UDF (scala.collection.immutable.map) to some valid data type in the Table API, namely via Java type (java.util.Map) as recommended here: Flink Table API & SQL and map types (Scala).…
py-r
  • 419
  • 5
  • 15
0
votes
1 answer

PyFlink - DataStream API - Missing module

I'm trying to start with the DataStream API, but have a missing module. Any idea what's wrong ? Version Python 3.7.9 python -m pip install apache-flink Code from pyflink.common.serialization import SimpleStringEncoder Error ModuleNotFoundError: No…
py-r
  • 419
  • 5
  • 15
0
votes
1 answer

PyFlink - JSON file sink?

Is it possible to use a JSON file sink in the Table API and/or DataStream API the same way as for CSV ? Thanks ! Code my_sink_ddl = f""" create table mySink ( id STRING, dummy_item STRING ) with ( 'connector.type' =…
py-r
  • 419
  • 5
  • 15
0
votes
1 answer

PyFlink - Issue with UNNEST: query uses an unsupported SQL feature?

I am trying to flatten an array using UNNEST function in the Table API. Am I doing something wrong or it is not a supported function ? This page suggests it though:…
py-r
  • 419
  • 5
  • 15