Questions tagged [pyflink]

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. PyFlink makes it available to Python.

PyFlink makes all of Apache Flink available to Python and at the same time Flink benefits from Ptyhon's rich scientific computing functions.

What is PyFlink on apache.org

258 questions

votes

1 answer

Accessing kafka timestamps in pyflink

I'm trying to write a Pyflink application for measuring latency and throughput. My data comes as json objects from a kafka topic and is loaded into a DataStream using the SimpleStringSchema-class for deserialization. Following the answer to this…

asked Apr 22 '21 at 01:21

Cipollino

votes

1 answer

PyFlink: called already closed and NullPointerException

I run into an issue where a PyFlink job may end up with 3 very different outcomes, given very slight difference in input, and luck :( The PyFlink job is simple. It first reads from a csv file, then process the data a bit with a Python UDF that…

apache-flink flink-sql pyflink

asked Apr 16 '21 at 03:34

yiksanchan

1,890
1
13
37

votes

1 answer

Adding func_type='pandas' to a PyFlink UDF throws ArrowTypeError('Did not pass numpy.dtype object'))

I have a PyFlink job that reads from a csv file (in path data.txt), sum up the first 2 integer columns, and print the result. Here's the data.txt file. > cat data.txt 1 1 1 1 2 2 2 2 Here is the file (named batch-prediction.py) that…

apache-flink flink-sql pyflink

asked Apr 14 '21 at 07:03

yiksanchan

1,890
1
13
37

votes

2 answers

Python Flink Connect to Remote Flink Environment

I have flink system running in a remote system.. say with IP as 10.XX.XX.XX and port as 6123. Now I would like to connect from another system using Pyflink using RemoteExecution Environment. I saw the docs…

apache-flink flink-streaming flink-sql pyflink

asked Apr 08 '21 at 07:59

Arun Chandramouli

votes

1 answer

Can I use PyFlink together with PyTorch/Tensorflow/ScikitLearn/Xgboost/LightGBM?

I am exploring PyFlink and I wonder if it is possible to use PyFlink together with all these ML libs that ML engineers normally use: PyTorch, Tensorflow, Scikit Learn, Xgboost, LightGBM, etc. According to this SO thread, PySpark cannot use Scikit…

pyspark apache-flink pyflink

asked Mar 15 '21 at 03:04

yiksanchan

1,890
1
13
37

votes

1 answer

Reading a csv file in batch mode using pyflink from local system

I was trying to read an established csv file while writing a pyflink job. I was using filesystem connector to get the data but after executing execute_sql() on the ddl and later doing query on the table I was getting an error which explains that it…

python csv apache-flink pyflink

asked Mar 11 '21 at 08:04

Avil Aneja

votes

2 answers

Pyflink : 'JavaPackage' object is not callable

When I run a Python file in Flink CLI using the following code: python3 word_count.py I got the error like this: Traceback (most recent call last): File "word_count.py", line 79, in word_count() File "word_count.py", line 37, in…

apache-flink pyflink

asked Jan 04 '21 at 09:39

YT Q

votes

1 answer

Converting Flink dynamic table into a Pandas dataframe

I'm using the pyflink table api to read data from Kafka. Now I want to convert the resultant table into a Pandas dataframe. Here is my code, exec_env = StreamExecutionEnvironment.get_execution_environment() exec_env.set_parallelism(1) t_config =…

pandas apache-kafka apache-flink pyflink flink-table-api

asked Dec 29 '20 at 12:51

Vidura Mudalige

votes

2 answers

Difference between flink run -py and python run

I recently working on learning pyflink, but I was a little bit confused. We know that pyflink table API convert stream/batch into table and do some work on it and finally sink to where u want. However, there are several ways to create table env: For…

python apache-flink pyflink

asked Dec 18 '20 at 02:29

虎纹鲨鱼

votes

1 answer

invalid syntax in running flink 1.12.0 wordcount example?

I am working on the first flink wordcount example from https://github.com/uncleguanghui/pyflink_learn. My environment is flink 1.12.0 and ubuntu, the flink is running in the background. The wordcount example is fairly simple. import os import…

apache-flink pyflink

asked Dec 18 '20 at 00:52

user824624

7,077
27
106
183

votes

1 answer

How to use table.where() to filter for subfields in PyFlink?

I'm using pyflink and Flink 11.2 and I've defined my table like this: def _create_sink_table(st_env): # Create SINK table. st_env.execute_sql(f""" CREATE TABLE {"in"} ( `a` STRING, `b` STRING, `c`…

pyflink

asked Nov 27 '20 at 16:36

Denis Nutiu

1,178
15
22

votes

1 answer

PyFlink - Scala UDF - How to convert Scala Map in Table API?

I'm trying to map the Map[String,String] object output of my Scala UDF (scala.collection.immutable.map) to some valid data type in the Table API, namely via Java type (java.util.Map) as recommended here: Flink Table API & SQL and map types (Scala).…

apache-flink pyflink

asked Nov 06 '20 at 21:31

py-r

votes

1 answer

PyFlink - DataStream API - Missing module

I'm trying to start with the DataStream API, but have a missing module. Any idea what's wrong ? Version Python 3.7.9 python -m pip install apache-flink Code from pyflink.common.serialization import SimpleStringEncoder Error ModuleNotFoundError: No…

apache-flink pyflink

asked Nov 04 '20 at 16:56

py-r

votes

1 answer

PyFlink - JSON file sink?

Is it possible to use a JSON file sink in the Table API and/or DataStream API the same way as for CSV ? Thanks ! Code my_sink_ddl = f""" create table mySink ( id STRING, dummy_item STRING ) with ( 'connector.type' =…

apache-flink pyflink

asked Nov 04 '20 at 13:12

py-r

votes

1 answer

PyFlink - Issue with UNNEST: query uses an unsupported SQL feature?

I am trying to flatten an array using UNNEST function in the Table API. Am I doing something wrong or it is not a supported function ? This page suggests it though:…

apache-flink pyflink

asked Nov 01 '20 at 21:18

py-r

Prev 1 2 3

…

17 18 Next