0

I have the following project structure,

work_directory:
    merge.py
    a_package

(i.e. a python file merge.py and a directory a_package under the directory "work_directory")

I wrote a MapReduce job using MRJob in merge.py, in which I need to import a_package, like from a_package import something. But I have difficulty uploading a_package into hadoop.

I have tried this method(https://mrjob.readthedocs.io/en/latest/guides/writing-mrjobs.html#using-other-python-modules-and-packages): I wrote

class MRPackageUsingJob(MRJob):
    DIRS = ['a_package']

and import code from inside a mapper

def mapper(self, key, value):
    from a_package import something

I also tried this one: https://mrjob.readthedocs.io/en/latest/guides/setup-cookbook.html#uploading-your-source-tree

But neither of them work, it keeps showing ImportError: No module named a_package.

What should I do?

luw
  • 207
  • 3
  • 14
  • how do you run the script merge.py? It may be related – TopW3 Jul 08 '21 at 18:08
  • If `merge.py` is not run as __main__ (`python merge.py`), but imported as a module, then you should use relative import like: `from .a_package import something`. `__init__.py` actually should not matter, because since python 3.3 namespace packages are a thing. – Syler Jul 08 '21 at 18:16
  • Thank you for asking. I have another file `main.py` under `work_directory` which runs as *main*. In that file I wrote ```dirname = os.path.split(os.path.realpath(__file__))[0], cmd = "doas python dirname/merge.py", os.system(cmd)``` to run merge.py. @Syler , @TopW3 – luw Jul 08 '21 at 18:36
  • Is this an actual code? `dirname` there is never used. In `python dirname/merge.py` it's just a word, not a variable reference. Also, ultimately, you are still running `python dirname/merge.py` which means that merge.py is running as the __main__ module. Can you `print` data in `merge.py`? Try printing out `sys.path` to check modules search path and `os.listdir()` to see if a_package dir is actually uploaded. – Syler Jul 08 '21 at 18:56

1 Answers1

0

You need just create empty file "__init__.py" in folder, what you want to use like a package. For example:

work_directory:
  __init__.py
  merge.py
  a_package
magicarm22
  • 135
  • 10