1

I would like to decrypt a bunch of very large files and use the decrypted version of each file as input to a Python script which will process its content. So, if I have a file named

 file1.sc.xz.gpg

after running the GnuPG decryption tool the output should be stored in a file named

 file1.sc.xz 

inside the same directory and this file should be the input to the Python script which will process its contents. Ideally I would like to do this inside one single Bash command, but I couldn't find the right way to do it. What I tried is:

 find test/ -type f | parallel 'f="{}"; g="${f%.*}"; gpg "$f" > "$g" | python iterating-over-tokens.py "$g" '

but is not working. Any other suggestions? Many thanks in advance.

Later edit: if I could send the decrypted file (*.sc.xz) content directly to the Python script as an argument, that would be even better.

Crista23
  • 3,203
  • 9
  • 47
  • 60
  • Please elaborate on "it's not working". Is it throwing errors? What is it doing? What is it *not* doing? – Mr. Llama Jun 10 '15 at 20:27
  • The gpg tool is running in parallel, and is telling me each time that a filename with extension *.sc.xz already exists, and is asking me to manually confirm if to overwrite or not. Since it is running in parallel, I get many messages like this within a very short timeframe, altough those files do not appear to be there. I think I am doing something wrong.. – Crista23 Jun 10 '15 at 20:33
  • 1
    Instead of saving to file, and giving that file to the Python script, you may pipe the decrypted data to stdout, and have Python on the other side of the pipe read stdin. – boardrider Jun 10 '15 at 20:39
  • In fact, something is very weird here, because you are both piping to `python` with `|` *and* redirecting to a file with `>`. – larsks Jun 10 '15 at 20:40
  • Of course you should always use the `--batch` option of `gpg` when you are running inside `parallel`. – 4ae1e1 Jun 10 '15 at 20:46
  • @boardrider yes, this sounds like a very good option, and especially considering the large size of the files, that would be exactly what I need. Could you maybe show me an example on how to do it? – Crista23 Jun 10 '15 at 21:25
  • @larsks If I could pipe directly to python without saving on disk the decrypted file content, that would be even better. – Crista23 Jun 10 '15 at 21:28

1 Answers1

1

Directly piped to Python:

parallel gpg -o - {} '|' python -c "'import sys; print sys.stdin.read().upper()'" ::: *.gpg 

Create decrypted file first:

parallel gpg -o {.} {} ';' python -c "'import sys; print sys.argv'" {.} ::: *.gpg 

You need to be able to decrypt without entering a pass phrase. If gpg asks for a pass phrase run gpg-agent first.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thanks for your reply. I have tried sending it directly piped into Python as the first argument find test/ -type f | parallel gpg -o -{} '|' python -c "iterating-over-tokens-example.py" ::, but it gives NameError: name 'iterating' is not defined . Am I doing somthing wrong? – Crista23 Jun 11 '15 at 06:54
  • If I call it like this: find test/ -type f | parallel gpg -o -{} '|' python iterating-over-tokens-example.py :: , my python script gets called but it complains when inside the script I am trying to use the first argument Chunk(path=sys.argv[1]) - raise exc IOError: :: does not exist but mode='rb' . So I believe the decrypted file is not being sent as an argument. – Crista23 Jun 11 '15 at 07:03
  • If I try to run the second command, to create the decrypted file first, what I get is another IO error: ['-c', '*'] gpg: can't open `*.gpg' – Crista23 Jun 11 '15 at 07:33
  • It works: find test/ -type f | parallel gpg -o {.} {} ';' python processFile.py {.} ::: – Crista23 Jun 11 '15 at 09:07