3

I'm ashamed to resort to asking for help again, but I'm stuck.

I have a spanish novel (in plain text), and I have a Python script that's supposed to put translations for difficult words in parentheses, using a custom dictionary in another text file.

After a lot of trial and error, I've managed to have the script run, and write the novel to a new text file as it's supposed to do.

Only problem is, no changes have been made to the text in the novel, that is, the translations haven't been inserted into the text. The dictionary is a plain text file, and it's formatted like this:

[spanish word] [english translation]                                      
[spanish word] [english translation]

and so on. Note that the words isn't really enclosed in brackets. There's a single space between each word, and there isn't spaces anywhere else in the file.

Here's the offending code:

bookin = (open("novel.txt")).read()
subin = open("dictionary.txt")
for line in subin.readlines():
    ogword, meaning = line.split(" ")
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)
    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)
subin.close()
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

Advice would be greatly appreciated.

Edit: The MemoryError is solved now, there were errors in the dictionary I thought I'd fixed. Thank you so much to those who helped me with this stupid problem!

Yngve
  • 743
  • 4
  • 10
  • 15
  • 3
    Please try and write a title that describes your problem. – agf Apr 17 '12 at 05:18
  • I guess it was a bad title, thanks for the feedback. – Yngve Apr 17 '12 at 05:28
  • 1
    Also, consider using `with` blocks when interacting with resources which need to closed eventually. http://effbot.org/zone/python-with-statement.htm . [This article](http://preshing.com/20110920/the-python-with-statement-by-example) highlights other usages of the `with` statement. – Sanjay T. Sharma Apr 17 '12 at 11:37
  • @Sanjay: Thanks for the tip, I'll read your links. – Yngve Apr 17 '12 at 12:12

4 Answers4

7

Change:

bookin.replace(ogword, subword)

to

bookin = bookin.replace(ogword, subword)

Explanation: replace does not change the string in place- in fact, strings are immutable- instead, it returns a new version.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • Your correction seems to have worked, because now I have another problem, namely a MemoryError:D Oh well, back to the drawing board. – Yngve Apr 17 '12 at 06:03
  • 1
    If the size of `bookin` is the problem, it may be better to first read the whole dictionary into the memory and then process novel.txt line by line, assuming that dictionary.txt is significantly smaller. – Ulf Rompe Apr 17 '12 at 08:05
  • I didn't check back because I didn't expect more replies. The dictionary is much smaller, yes. It's 76 kb, while bookin is 2024 kb. I'll research how to what you suggested. Thanks a lot. – Yngve Apr 17 '12 at 09:38
  • Oh, and the traceback is: `Traceback (most recent call last): File "C:\Python27\trascri.py", line 9, in bookin = bookin.replace(ogword, subword) MemoryError` – Yngve Apr 17 '12 at 09:39
  • 1
    That's odd- those aren't large files at all (Python can easily handle a 2 MB string, and though `bookin` will grow with your `replace` statements I wouldn't expect it to grow *that* much. Do you want to post the files so we can test the function, or, if they're private, email them to me? Alternatively, could you try putting the line `print(len(bookin))` before each replacement, and see what the program prints before the MemoryError? – David Robinson Apr 17 '12 at 12:41
  • This is embarrassing - it turns out the problem came down to just four lines in the dictionary consisting of only the English word, so that the line started with a space, then the word. I thought I'd used a foolproof method of weeding out those, not so much. How those four errors made the whole thing run out of memory is beyond me, but now it works fine. Thanks a lot for the help. – Yngve Apr 18 '12 at 08:42
  • 1
    I know exactly how it made it run out of memory. Your program was running the code `bookin.replace("", " (English word)")` for each of the four words that was missing a Spanish word. That puts the English word in between *every single letter* of the bookin string, multiplying its length by more than the length of the word (if your English word was hello, string "My story" becomes "M(hello)y(hello) (hello)s(hello)...) Do that four times and you sure will run out of memory! – David Robinson Apr 18 '12 at 17:22
2

As @David Robinson pointed out the problem was your use of replace. It should have been

 bookin = bookin.replace(ogwrd, subword)

I was up last night when you posted your question (and I upvoted both the question and the answer - I didn't get to post in time myself), but the question stuck with me. And even though an answer has been posted and accepted, I wanted to offer the following advice - as I believe that if you can generate code like shown above, it is quite likely that you can ferret out most sources of your problems autonomously.

What I would suggest in these sort of problems is to create a small data files, say 10 records/lines and use it to trace the data through your program by peppering it with some diagnostic print statements. I am showing a version of this below. It's not completely done, but I hope the intention is clear.

The basic idea is to verify that everything you expect to happen is actually happening at each step by looking at the output your "debugging print statements" generate. In this case you would have seen that bookin did not get modified.

bookin = (open("novel.txt")).read()
subin = open("dictionary.txt")

print 'bookin =', bookin  # verify that you read the information 

for line in subin.readlines():
    print 'line = ', line # verify line read

    ogword, meaning = line.split(" ")
    print 'ogword, meaning = ', ogword, meaning # verify ...

    subword = ogword + "(meaning)"
    print 'subword =', subword # verify ...

    bookin.replace(ogword, subword)
    print 'bookin post replace =', bookin # verify ... etc

    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)

subin.close() 

print 'bookout', bookout # make sure final output is good ...
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

Finally, one additional plus that Python has over other languages is that you can work with it interactively. What I end up doing frequently is to verify my understanding of functions and behavior in the interpreter (I'm often too lazy to look at the documentation - that's actually not a good thing). So, in your case since the problem was with replace (my debugging print statements would have shown this to me) I would have tried the following sequence in the interpreter

 s = 'this is a test'
 print s
 s.replace('this', 'that')
 print s

and would have seen that s didn't change, in which case I'd have looked at the documentation, or simply tried s = s.replace('this', 'that').

I hope this is helpful. This basic debugging technique can often help pinpoint a problem area and be a good first step. Down the line debuggers etc are quite useful.

PS: I'm new to SO, so I hope this sort of additional answer is not frowned upon.

Levon
  • 138,105
  • 33
  • 200
  • 191
  • It's not frowned upon by me, at least. Much appreciated. The thing is, I've gotten a lot of help writing this code, I couldn't have written it on my own. So I have a limited understanding of it, and I've decided I have to educate myself a lot more before attempting these things. Thanks a lot for the advice, I'll use this method in the future. – Yngve Apr 17 '12 at 11:35
1

You can get this information when typing these in the interpreter:

>>> help(str.replace)  
>>> help('a'.replace)  
>>> s = 'a'  
>>> help(s.replace)  
>>> import string  
>>> help(string.replace)
1

Apart from the MemoryError, which is astonishing, given the size of your files, you still have several things that could be improved; see comments below:

bookin = open("novel.txt").read() # don't need extra ()
subin = open("dictionary.txt")
# for line in subin.readlines():
# readlines() reads the whole file, you don't need that
for line in subin:
    # ogword, meaning = line.split(" ")
    # the above will leave a newline on the end of "meaning"
    ogword, meaning = line.split()
    # subword = ogword + "(meaning)"
    # if ogword is "gato" and meaning is "cat",
    # you want "gato (cat)"
    # but you will get "gato(meaning)"
    subword = ogword + " (" + meaning + ")"
    bookin = bookin.replace(ogword, subword)
    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"  # fix this also
    bookin.replace(ogword, subword) # fix this also
    print len(bookin) # help debug your MemoryError
subin.close()
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

You need to follow the advice of @Levon and try your code on some small test data files so that you can see what is happening.

After using this one-line dictionary:

gato cat

with this one-line novel:

El gato se sirvió un poco de Gatorade para el "alligator".

you may wish to reconsider your high-level strategy.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Thank you for this, it really made things clearer. And I see your point about "gato":D I expected this type of problem, but I thought I'd deal with them as they show up. – Yngve Apr 18 '12 at 08:49