0

I am running tika on my Linux server, and I want to run it using python (subprocess.Popen)

However, I have a non-root access, so I only have a local java installation. Every time I need to set the java home and path for each session :

export JAVA_HOME=/usr/java/jdk1.5.0_07/bin/java

export PATH=$PATH:/usr/java/jdk1.5.0_07/bin

Then I can run tika from the java directory and save the output to some file out_txt.txt

curl www.vit.org/downloads/doc/tariff.doc | java -jar tika-app-1.3.jar --text >out_txt.txt

So, I need to know how to use Popen to:

  1. set the java home and path using Popen
  2. write Tika output to the file out_txt.txt
syb0rg
  • 8,057
  • 9
  • 41
  • 81
hmghaly
  • 1,411
  • 3
  • 29
  • 47

1 Answers1

1

1) you could:

  • use os.setenv('JAVA_HOME', '/usr/java/jdk1.5.0_07') before Popen, that would set the environment variable for all futre invocations. or for PATH:

    os.environ['PATH'] += ":/usr/java/jdk1.5.0_07"
    
  • pass an environment dict to Popen as env:

    environ = os.environ.copy()
    environ['JAVA_HOME'] = '/usr/java/jdk1.5.0_07'
    environ['PATH'] += ':/usr/java/jdk1.5.0_07/bin'
    subprocess.Popen(args, env=environ)
    

2)

  • open the file for writing and pass it to Popen as stdout (and optionally stderr):

    output = open(outfile, 'wb')
    subprocess.Popen(args, stdout=output, stderr=output)
    
  • leave it to the shell by passing shell=True to Popen:

    Popen("curl www.vit.org/downloads/doc/tariff.doc | java -jar tika-app-1.3.jar --text >out_txt.txt", shell=True)
    
mata
  • 67,110
  • 10
  • 163
  • 162
  • Just great, I only seem to have a problem with the command Popen("curl www.vit.org/downloads/doc/tariff.doc | java -jar tika-app-1.3.jar --text >out_txt.txt", shell=True), because when I set the path to the Java directory, it does not have the "curl" I get the following: >>> /bin/sh: curl: command not found so is there a way to be able to run Java and curl at the same time? – hmghaly Apr 12 '13 at 13:04
  • 1
    you shouldn't _replace_ the `PATH`, but append the java bin directory to it. updated my answer to reflect this. or just use full paths. – mata Apr 12 '13 at 13:30