2

I am trying to start with PyFlink and Kafka, but get below error.

Thanks for your support !

Installation

python -m pip install apache-flink
pip install pyFlink 

Code

from pyFlink.datastream import StreamExecutionEnvironment

Error

ModuleNotFoundError: No module named 'pyFlink'
py-r
  • 419
  • 5
  • 15

1 Answers1

5

To install PyFlink, you only need to execute:

python -m pip install apache-flink

and make sure you have a compatible Python version (>= 3.5).

Imports are case-sensitive; the error is thrown because the package name is "pyflink", not "pyFlink". So, instead, you can try:

from pyflink.datastream import StreamExecutionEnvironment

If you're going to use Kafka, please remember to also add the required (JAR) dependencies, using:

config = t_env.get_config().get_configuration()
config.set_string("pipeline.jars",
                  "file:///path/to/jar/jarfile.jar")

You can read more about handling connectors and other dependencies in the PyFlink documentation.

morsapaes
  • 436
  • 2
  • 7
  • Thanks Marta ! This helps starting up. – py-r Nov 05 '20 at 17:43
  • My comment about installing and starting Kafka/Zookeeper first might sound trivial, but is likely not straightforward for newbies. – py-r Nov 12 '20 at 17:32
  • It's for sure relevant for beginners, @py-r. I just don't think it should be part of the answer since it's not Flink-related. – morsapaes Nov 13 '20 at 09:40