Highest Voted 'pyflink' Questions

1

vote

0 answers

How to specify CREATE TABLE in Flink SQL when receiving data stream of non-primitive types (using PyFlink)?

A Flink SQL application receives data from an AWS Kinesis Data Stream, where the received messages are in JSON and where the schema is expressed in JSON Schema and which contains a property which is not a primitive object, for example: { "$id":…

asked Apr 04 '22 at 08:52

John

10,837
17
78
141

1

vote

1 answer

Flink SQL behavior

I want to execute Flink SQL on batch data. (CSVs in S3) However, I explicitly want Flink to execute my query in a streaming fashion because I think it will be faster than the batch mode. For example, my query consists of filtering on two tables and…

apache-flink flink-sql pyflink

asked Apr 02 '22 at 16:52

bumpbump

542
4
17

1

vote

1 answer

python-archives Not A Directory Exception While Running Flink Job - PyFlink

I'm getting the following exception when running a pyflink application: I'm using start-cluster.sh to start the flink cluster I'm using Python virtual environment to run the flink job (/root/Python3.6/venv.zip) I've set archive path in the…

python virtualenv apache-flink flink-streaming pyflink

asked Mar 16 '22 at 19:44

Denorm

466
4
13

1

vote

0 answers

Looking for an example of Pyflink with Kinesis

I have a kinesis stream to which I want to listen using pyflink. I have installed the Apapache-flink@1.12.2 package for python3, I saw a few examples for using Kinesis in python (such as this one: https://stackoverflow.com/a/22403036 ) and…

python-3.x amazon-kinesis pyflink

asked Mar 16 '22 at 08:56

user1322801

839
1
12
27

1

vote

1 answer

PyFlink 14.2 - Table API DDL - Semantic Exactly Once

I've had the scenario where I define a kafka source, UDF | UDTF for processing and sink to a Kafka sink. Doesn't matter what I do, if I run the job, the output is flood with the processed output of a single input record. For illustrative purposes,…

apache-flink flink-streaming flink-sql pyflink

asked Mar 14 '22 at 15:16

Paul

756
1
8
22

1

vote

0 answers

flinksql read custom format data with json

I am trying to read streaming data from Kafka, where streaming logs with the format as below 03-06-2022 02:56:130 INFO [...] [...] {"a": "abc", "b": 123} I want to extract the JSON part from the logs, which is the {"a": "abc", "b": 123} part, and…

apache-flink flink-streaming pyflink

asked Mar 07 '22 at 06:47

zaldini0711

21
1

1

vote

1 answer

PyFlink "pipeline.classpaths" vs $FLINK_HOME/lib

What is the difference between class loading classes passed as part of PyFlink pipeline.classpath config and putting them into a $FLINK_HOME\lib directory? When I want to use flink-sql-connector-kafka-*.jar it works fine just passing it using…

java apache-flink classloader flink-sql pyflink

asked Jan 24 '22 at 10:48

literg

482
5
13

1

vote

1 answer

The meaning of wait in execute_sql statement

I was wondering what is the difference and implications of executing sql statements in pyflink with and without wait() command: t_env.execute_sql(query) t_env.execute_sql(query).wait() I experimented with both, and see no difference in execution.

apache-flink pyflink

asked Jan 11 '22 at 12:32

Dark Templar

1,175
13
27

1

vote

1 answer

Flink developer's role from watermarks perspective

I am a newbie to Flink and came across an article that mentioned "A flink developer is responsible for moving event time forward by arranging the watermark in the stream". So, I figured out the possible answer for this. As per my knowledge, if I…

streaming apache-flink flink-streaming pyflink

asked Dec 23 '21 at 20:25

whatsinthename

1,828
20
59

1

vote

0 answers

PyFlink reading Parquet files

I'm happily reading text files via env.read_text_file(file_path), but how can I read a parquet file in PyFlink? I'm aware of https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/dataset/formats/parquet/ for Java/Scala ...but is…

parquet pyflink

asked Dec 23 '21 at 09:26

diegoruizbarbero

101
10

1

vote

1 answer

pyflink tableAPI, multiple sources to single processing table sequence

I'm trying to implement a pyflink job (Via Table API) which does some basic processing from multiple sources, after the data from the sources gets converted into a standard format. I'm able to convert the data from each respective source into the…

apache-flink flink-streaming flink-sql pyflink

asked Dec 06 '21 at 19:35

Paul

756
1
8
22

1

vote

0 answers

Pyflink windowAll() by event-time to apply a clutering model

I'm a beginner on pyflink framework and I would like to know if my use case is possible with it ... I need to make a tumbling windows and apply a python udf (scikit learn clustering model) on it. The use case is : every 30 seconds I want to apply…

apache-flink flink-streaming pyflink flinkml

asked Nov 29 '21 at 10:05

dorian finel

21
3

1

vote

2 answers

Invalid SQL identifier - org.apache.flink.sql.parser.impl.ParseException: Encountered "TABLE" at line 2, column 16

I am trying to run a PyFlink Job that takes data from source Kafka topic sinks it into hdfs. There is a weird SQL-related error that keeps arising. This is from SQL statement in Apache-Flink (PyFlink) Table API Sink: SQL: sql_statement_sink = """ …

sql python-3.x apache-flink flink-sql pyflink

asked Nov 21 '21 at 13:54

3awny

319
1
2
10

1

vote

1 answer

PyFlink SQL local test

So I have a simple aggregation job written in PyFlink SQL API. The job read data from AWS kinesis and output result to Kinesis. I am curious if I can unit-test my pipeline with say pytest? I am guessing I need mock the source and sink with…

apache-flink flink-streaming flink-sql pyflink

asked Nov 12 '21 at 03:13

Alfred

1,709
8
23
38

1

vote

0 answers

Does pyflink have a max function or max_by function?

I can't find the corresponding function max or max_ by ** This is my message ** Java public class TransformTest2_Rolling { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env =…

python pyflink

asked Nov 09 '21 at 13:28

baidong yin

11
1

Questions tagged [pyflink]