0

I have a python program which dynamically move and rename files into a hadoop cluster. The files usually range from 10mb(parsed) up to 1.5gb (raw data). For the move commands to finish it can take a while and from what I can tell python races through them and none of the move commands get to finish. What is the proper way to have python wait for previous commands. I store the commands in a variable and pass it to os.system. The relevant code is

os.system(moverawfile)
os.system(renamerawfile)
os.system(moveparsedfile)
os.system(renameparsedfile)

I know rename commands are done basically instantaneously. Am I not supposed to use os.system? How do i ensure that python will wait for each command to finish before moving onto the next one.

Sachith Muhandiram
  • 2,819
  • 10
  • 45
  • 94
Sam
  • 293
  • 3
  • 19
  • What is your exact code? `os.system` does not return until the command it spawns exits. – chepner Mar 02 '16 at 03:41
  • 1
    You should be using [`subprocess`](https://docs.python.org/2/library/subprocess.html) anyways. You can have an exception thrown on a command error, for example. – Alyssa Haroldsen Mar 02 '16 at 03:44
  • os.system just calls [`system(3)`](http://linux.die.net/man/3/system), and that waits for the command to complete. – Kevin Mar 02 '16 at 03:48
  • hadoop fs -put rawjsondata.txt /home/hadoop/project/March/raw/rawjsondata.txt thats the entirety of "moverawfile" – Sam Mar 02 '16 at 03:51

1 Answers1

1

I would suggest that you use run from subprocess as per Python documentation. It waits for your command to complete before returning.

Bob Ezuba
  • 510
  • 1
  • 5
  • 22
  • Some methods in `subprocess` wait -- others don't. . . – mgilson Mar 02 '16 at 03:46
  • whats the proper syntax of subprocess? – Sam Mar 02 '16 at 03:52
  • `os.system` already waits; if it's behaving asynchronously with `os.system`, `subprocess` won't fix that (even though it's a good idea in general). – ShadowRanger Mar 02 '16 at 04:15
  • @Sam: For [your example command](https://stackoverflow.com/questions/35738065/make-python-wait-for-commands-to-end#comment59150293_35738065), you'd do something like `subprocess.check_call(['hadoop', 'fs', '-put', localfilepath, remotefilepath])` (I'm assuming the file paths are in variables somewhere). Use plain [`subprocess.call`](https://docs.python.org/3/library/subprocess.html#subprocess.call) if you don't want it to raise an exception. – ShadowRanger Mar 02 '16 at 04:17