0

Hi:) I am not able to figure out what the error in the program is could you please help me out with it. Thank you..:)

The input file contains the following:

3.  भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
57. जैसे आज के झारखंड प्रदेश से, उन दिनों, बहुत से लोग चाय बागानों में मजदूरी करने के उद्देश्य से असम आए।

( its basically sample sentences for which i need to get word positions in the output appended to each word in hindi)

for e.g the output for the first sentence would look like this:

3.  भारत(1) का(2) इतिहास(3) काफी(4) समृद्ध(5) एवं(6) विस्तृत(7) है(8) ।(9)

I should get a similar op for the following sentence(s)

The code looks like this:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# encoding: utf-8
separators = [u'।', ',', '.']
text = open("hinstest1.txt").read()
#This converts the encoded text to an internal unicode object, where
# all characters are properly recognized as an entity:
text = text.decode("UTF-8")
#this breaks the text on the white spaces, yielding a list of words:
words = text.split()

counter = 1

output = ""
#if the last char is a separator, and is joined to the word:
for word in words:
    if word[-1] in separators and len(word) > 1:
        #word up to the second to last char:
        output += word[:-1] + u'(%d) ' % counter
        counter += 1
        #last char
        output += word[-1] +  u'(%d) ' % counter
    else:
        output += word + u'(%d) ' % counter
        counter += 1

    print output

The error I am getting is:

  File "pyth_hinwp.py", line 22
    output += word[-1] +  u'(%d) ' % counter
                         ^
SyntaxError: invalid syntax

I know this question is something similar to what I have asked earlier, but since I am not able to successfully execute some of the answers given to me earlier hence I am kinda restructuring the question to the place where I am currently getting stuck.

sth
  • 222,467
  • 53
  • 283
  • 367
boddhisattva
  • 6,908
  • 11
  • 48
  • 72

2 Answers2

3

What is posted here does not have the error. Note that what is posted has TWO space characters between the + and the u in output += word[-1] + u'(%d) ' % counter. What is probably happening is that you have a whitespace character other than a space in there. A possibility is NBSP (U+00A0) aka "no-break space". What SO does to format your code is likely to scrub away such things.

Diagnosis: At the Python interactive prompt, type

open("pyth_hinwp.py").readlines()[22-1]

What do you see between the + and the u?

Fix: in your editor, delete both characters between the + and the u. Insert a single space.

By the way, with a syntax error, the problem is entirely within the named SOURCE file; the code has not been run (because it couldn't be compiled) and so what is in your INPUT file has no bearing on the problem.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Thank you for your response:) , I tried running what you said at the interactive prompt. This is what I got : "\t\toutput += word[-1] +\xc2\xa0u'(%d) ' % counter\r\n" What do you think can I do to rectify this error? – boddhisattva Feb 20 '10 at 08:25
  • `'\xc2\xa0'` is as I guessed an NBSP (U+00A0) encoded in UTF-8. Fix == rectify. Generalising what I wrote in my answer, use an editor to delete whatever is between the + and the u and then insert a single space. – John Machin Feb 20 '10 at 09:00
  • Also, do not use any "word processing" editor of any kind to produce Python code ever. You must use the barest, simplest text-only editor. Spacing matters, and invisible characters (like a non-breaking space) are impossible to diagnose. Use `idle` or `komodo edit` or `BBEdit` or some programming tool. Do not use a word processor. – S.Lott Feb 20 '10 at 13:03
0

If you have syntax error, your editor may be showing it before even running it? I any case why don't you try removing that char where error is being indicated, because I am not able to replicate problem, after copying your code.

Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219