1

I am doing a project in Natural Language Processing using nltk in python. The block structure of project is as follows:

  1. Interface (in php) ->
  2. [NLP Engine] (in python) ->
  3. API calls (in php) ->
  4. Result (in php)

The input is supposed to go via GET method from PHP Interface to the Python Engine.

Background:

I have created a virtual host (url=/linguistics/) server using Easy-PHP Dev Server (Location=D:\Computational_Linguistics). I have enabled it so that it can execute Test.py so that when I type linguistics/Test.py, it executes.

Issue:

The basic CGI was successfully executed and I could see the output in Chrome. But as soon as I imported another module, it returned this error:

Server error!

The server encountered an internal error and was unable to complete your request.

Error message: End of script output before headers: engine.py

If you think this is a server error, please contact the webmaster.

Error 500

linguistics Apache/2.4.4 (Win32) PHP/5.5.0

When I do NOT import nltk (or any other non-standard package) it works.

I did do the websearch to find the solution, and came to know I have to setup some environment variables to make it work. But, I can not figure out how.

My code:

#!C:/Python27/python.exe
import nltk
from nltk import *
import re
import cgi, cgitb

inpt=cgi.FieldStorage() 
str_in = inpt.getvalue('query')

def is_noun (str):
    tags=nltk.pos_tag(nltk.word_tokenize(str))
    for i in tags:
        if i[1][1]=='N' or i[1][1]=='V':                            #Finding out the Nouns and the Verbs.
            print "<h5>%s is a noun.<h5>" %i[0]

is_noun(str_in)

print "Content-type:text/html\r\n\r\n"
print "<html>"
print "<head>"
print "<title>Hello - Second CGI Program</title>"
print "</head>"
print "<body>"
is_noun(str_in)
print "</body>"
print "</html>"
Michael0x2a
  • 58,192
  • 30
  • 175
  • 224
Prashant Sinha
  • 323
  • 4
  • 9

2 Answers2

0

Since I received no answers (Not blaming anyone!) I read more documentations. As I have described in my Problem statement above, only NLP engine is written in Python. And, the problem exists in CGI environment only. Hence My solution:

I modified engine.py to recieve input as commanline arguments, and then process upon it. It returns the processed data (In a exact format) back to buffer stream. I used exec() command in PHP to do so.

The project is on GitHub, so If anyone wants to have a look at it, he's most welcome!

PS: I still don't know the reason for that error. I am hell sure that all environment paths were correct. So I'd call this answer a work-around, rather than a solution.

PPS: I am answering my own question, so that If anybody have same problem, they might consider this work around.

Prashant Sinha
  • 323
  • 4
  • 9
0

The problem is that you run is_noun twice, and the first one before you sent any headers. Hence, the error.

Another problem is that str_in is str, but I think nltk.pos_tag expects unicode. that is you need to decode the str_in value (if you use any symbols outside plain ASCII. That is you should do it anyway, but you will notice only if there will be such a character in the input):

str_in = unicode(inpt.getfirst('query', ''), 'utf-8')

and then, when you print unicode, you will need to encode it back:

print "<h5>%s is a noun.<h5>" % i[0].encode('utf-8')

But, in its current form it might be looking garbled in the browser, because you need to notify the browser, that the charset is 'utf-8', that is you need to change the content-type header:

print "Content-Type: text/html; charset=utf-8"
print

P.S. Hopefully, this is all for local use only and not available from the internet, because this should be much more complicated.

newtover
  • 31,286
  • 11
  • 84
  • 89