0

I'm working on creating a Python/PySpark library using VS Code. My goal is to debug in VS Code and create a .whl package to be installed in a Databricks cluster. I face the following situations:

  • if I use from checkenginelib.pysparkdq._constraints._Constraint import _Constraint I get a ModuleNotFoundError in VS Code and a module not found error in Databricks
  • if I use from pysparkdq._constraints._Constraint import _Constraint I get a ModuleNotFoundError in VS Code but all imports work well in Databricks
  • if I use from _constraints._Constraint import _Constraint I get no error in VS Code but I get a module not found error in Databricks

enter image description here

Luiz Viola
  • 2,143
  • 1
  • 11
  • 30

3 Answers3

1

From what I see, you are working in ./DATA-QUALITY-ENGINE/check-engine-lib/dqengine/validate_df. You have to import it the following way:

from check-engine-lib.dqengine.validate_df import *

That should work. Also you need to create a \__init__.py file to import other files as modules

Rabinzel
  • 7,757
  • 3
  • 10
  • 30
1

Because your module dqengine is not in the top level folder, it is probably not in your PYTHONPATH variable, which VSCode has probably added the path to DATA QUALITY ENGINE

Either:

  • Move it to the top level folder (Data quality engine)
  • add the path to check_engine_lib to PYTHONPATH.
  • Or as @franjefriten says, add an __init__ to check-engine-lib and do
from check-engine-lib.dqengine.validate_df import *
Tom McLean
  • 5,583
  • 1
  • 11
  • 36
0

As the comment said, when the method you need to import is in the same directory as the current file, you only need to import it directly.

from validate_df import _Constraint
MingJie-MSFT
  • 5,569
  • 1
  • 2
  • 13