5

Problem:

I have one folder(json_folder_large) which holds more than 200, 000 json files inside, another folder(json_folder_small) which holds 10, 000 json files inside.

import os
lst_file = os.listdir("tmp/json_folder_large") # this returns an OSError
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

I got an OSError when I use listdir with directory path. I am sure there is no problem with the path because I can do the same thing with the other folder without this OSError.

lst_file = os.listdir("tmp/json_folder_small") # no error with this

Env:

Problem above is with docker image as pycharm interpreter.

When the interpreter is conda env, there is no errors.

The only difference here I could see is that in my docker/preferences/resources/advanced, I set 4 CPU(max is 6) and 32GB memory(max is 64).

I tried:(under docker)

1. With Pathlib

import pathlib
pathlib.Path('tmp/json_folder_large').iterdir() # this returns a generator <generator object Path.iterdir at 0x7fae4df499a8>
for x in pathlib.Path('tmp/json_folder_large').iterdir():
    print("hi")
    break

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.7/pathlib.py", line 1074, in iterdir for name in self._accessor.listdir(self):
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

2. With os.scandir

os.scandir("tmp/json_folder_large") # this returns a generator <posix.ScandirIterator object at 0x7fae4c48f510>
for x in os.scandir("tmp/json_folder_large"):
    print("hi")
    break
Traceback (most recent call last):
  File "<input>", line 1, in <module>
OSError: [Errno 5] Input/output error: 'tmp/json_folder_large'

3.Connect pycharm terminal to docker container, then do ls

docker exec -it 21aa095da3b0 bash
cd json_folder_large
ls

Then I got an error(when the terminal is not connected to docker container, the code above raise no error!!!!!)

ls: reading directory '.': Input/output error

Questions:

  1. Is it really because of the memory issue?
  2. Is it possible to solve this error while everything is under the same directory? (I see we could split those files into different directories)
  3. Why my code raise error under docker but not conda env?

Thanks in advance.

Mapotofu
  • 268
  • 2
  • 4
  • 15
  • Hmm. Did you try ```os.listdir("/tmp/json_folder_large")?``` – mutantkeyboard Mar 22 '21 at 14:17
  • 1
    Hi, @mutantkeyboard, I am sure the path should be 'tmp/json_folder_large', if I do `os.listdir("/tmp/json_folder_large")`, I will get a `FileNotFoundError: [Errno 2] No such file or directory: '/tmp/json_folder_large'`. By the way my current working directory is `'/opt/project'` – Mapotofu Mar 22 '21 at 14:19
  • Does this answer your question? [IOError: \[Errno 5\] Input/output error](https://stackoverflow.com/questions/26805025/ioerror-errno-5-input-output-error) – Maurice Meyer Mar 22 '21 at 14:44

1 Answers1

0

You can use os.scandir or glob.iglob. They make use of iterator and avoid loading the entire list in memory.

lllrnr101
  • 2,288
  • 2
  • 4
  • 15
  • I just tried with `os.scandir` and `Path` from pathlib, nothing works in my case, I will update this into my question – Mapotofu Mar 22 '21 at 16:08
  • You mean you get the i/o error when using os.scandir also? – lllrnr101 Mar 22 '21 at 16:11
  • Hi @lllrnr101 I updated the part with os.scandir, the moment I iterate through the generator it raise an error – Mapotofu Mar 22 '21 at 16:24
  • Then I think memory is not your problem. I would put a sleep(60) and then attach the process to strace and see if I get any clue. – lllrnr101 Mar 22 '21 at 16:41
  • Do the calls for scnadir or listdir with a pattern also give you error. Like trying to get only a suibset of files from that dir? – lllrnr101 Mar 22 '21 at 16:44
  • I didn't add any pattern, anyway if it is cased by those reasons, then I could not explain why it works under **conda environment**, the only case it raise this error it's under **docker** – Mapotofu Mar 23 '21 at 09:21