3

I've build a Python script to randomly create sentences using data from the Princeton English Wordnet, following diagrams provided by Gödel, Escher, Bach. Calling python GEB.py produces a list of nonsensical sentences in English, such as:

resurgent inaesthetic cost. the bryophytic fingernail. aversive fortieth peach. the asterismal hide. the flour who translate gown which take_a_dare a punch through applewood whom the renewed request enfeoff. an lobeliaceous freighter beside tuna.

And saves them to gibberish.txt. This script works fine.

Another script (translator.py) takes gibberish.txt and, through py-googletrans Python module, tries to translate those random sentences to Portuguese:

from googletrans import Translator
import json

tradutor = Translator()

with open('data.json') as dataFile:
    data = json.load(dataFile)


def buscaLocal(keyword):
    if keyword in data:
        print(keyword + data[keyword])
    else:
        buscaAPI(keyword)


def buscaAPI(keyword):
    result = tradutor.translate(keyword, dest="pt")
    data.update({keyword: result.text})

    with open('data.json', 'w') as fp:
        json.dump(data, fp)

    print(keyword + result.text)


keyword = open('/home/user/gibberish.txt', 'r').readline()
buscaLocal(keyword)

Currently the second script outputs only the translation of the first sentence in gibberish.txt. Something like:

resurgent inaesthetic cost. aumento de custos inestético.

I have tried to use readlines() instead of readline(), but I get the following error:

Traceback (most recent call last):
  File "main.py", line 28, in <module>
    buscaLocal(keyword)
  File "main.py", line 11, in buscaLocal
    if keyword in data:
TypeError: unhashable type: 'list'

I've read similar questions about this error here, but it is not clear to me what should I use in order to read the whole list of sentences contained in gibberish.txt (new sentences begin in a new line).

How can I read the whole list of sentences contained in gibberish.txt? How should I adapt the code in translator.py in order to achieve that? I am sorry if the question is a bit confuse, I can edit if necessary, I am a Python newbie and I would appreciate if someone could help me out.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
reaction hashs
  • 51
  • 1
  • 1
  • 5
  • 2
    Please provide the full error message including the stack trace. I doubt it is `readlines` that is causing that error. Instead, `readlines` returns a *list* of lines, and you are probably trying to put that list in a `dict` but you cannot, because list objects are not hashable. In general, you need to provide a [mcve] – juanpa.arrivillaga Dec 25 '18 at 04:15
  • 1
    In particular, here: `data.update({keyword: result.text})` – juanpa.arrivillaga Dec 25 '18 at 04:17
  • Just edited with full error message, sorry for that. – reaction hashs Dec 25 '18 at 04:35
  • Ok, as I explaind, `readlines` returns a `list` object, which is not hashable, thus, it cannot be used as keys to `dict` objects – juanpa.arrivillaga Dec 25 '18 at 04:52
  • I understand, so readlines() wouldn't fit here at all? Thanks for helping out. So, using readline() as provided in the code snippet outputs the translation of the first line in the list of sentences. What should I use instead, in order to output the translation of all the sentences in the .txt? – reaction hashs Dec 25 '18 at 04:56
  • Sure readlines works, or you can just iterate over the file iterator. See the answer by @arryph – juanpa.arrivillaga Dec 25 '18 at 05:10

3 Answers3

11

Let's start with what you're doing to the file object. You open a file, get a single line from it, and then don't close it. A better way to do it would be to process the entire file and then close it. This is generally done with a with block, which will close the file even if an error occurs:

with open('gibberish.txt') as f:
    # do stuff to f

Aside from the material benefits, this will make the interface clearer, since f is no longer a throwaway object. You have three easy options for processing the entire file:

  1. Use readline in a loop since it will only read one line at a time. You will have to strip off the newline characters manually and terminate the loop when '' appears:

    while True:
        line = f.readline()
        if not line: break
        keyword = line.rstrip()
        buscaLocal(keyword)
    

    This loop can take many forms, one of which is shown here.

  2. Use readlines to read in all the lines in the file at once into a list of strings:

    for line in f.readlines():
        keyword = line.rstrip()
        buscaLocal(keyword)
    

    This is much cleaner than the previous option, since you don't need to check for loop termination manually, but it has the disadvantage of loading the entire file all at once, which the readline loop does not.

    This brings us to the third option.

  3. Python files are iterable objects. You can have the cleanliness of the readlines approach with the memory savings of readline:

    for line in f:
         buscaLocal(line.rstrip())
    

    this approach can be simulated using readline with the more arcane form of next to create a similar iterator:

    for line in next(f.readline, ''):
         buscaLocal(line.rstrip())
    

As a side point, I would make some modifications to your functions:

def buscaLocal(keyword):
    if keyword not in data:
        buscaAPI(keyword)
    print(keyword + data[keyword])

def buscaAPI(keyword):
    # Make your function do one thing. In this case, do a lookup.
    # Printing is not the task for this function.
    result = tradutor.translate(keyword, dest="pt")
    # No need to do a complicated update with a whole new
    # dict object when you can do a simple assignment.
    data[keyword] = result.text

...

# Avoid rewriting the file every time you get a new word.
# Do it once at the very end.
with open('data.json', 'w') as fp:
    json.dump(data, fp)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Wow, this is a great lesson. For a beginner like me, there's a lot to learn here. I will try those out later in order to make the program more efficient and I get back to you. Thanks for taking the time to write such a detailed answer, really appreciate. – reaction hashs Dec 25 '18 at 05:50
  • @Mad Physicist, if I had not read your post, I would not have known that I could iterate lines by referencing the file pointer!, of which the type (or class) is '_io.TextIOWrapper'! Wonderful! – Sherman Chen May 15 '22 at 11:01
  • @ShermanChen. I'm glad you got something useful out of my post. Just as an FYI, if you open the file in binary mode, the type will be different. – Mad Physicist May 15 '22 at 15:35
  • @Mad Physicist, Oh. I tried to read a binary file, and its type is '_io.BufferedReader'. – Sherman Chen May 17 '22 at 13:09
2

If you are using readline() function, you have to remember that this function only returns a line, so you have to use a loop to go through all of the lines in the text files. In case of using readlines(), this function does reads the full file at once, but return each of the lines in a list. List data type is unhashable and can not be used as key in a dict object, that's why if keyword in data: line emits this error, as keyword here is a list of all of the lines. a simple for loop will solve this problem.

text_lines = open('/home/user/gibberish.txt', 'r').readlines()
for line in text_lines:
     buscaLocal(line)

This loop will iterate through all of the lines in the list and there will be error accessing the dict as key element will be a string.

arryph
  • 2,725
  • 10
  • 15
  • Thanks @arryph, works like a charm. Sorry for asking such a basic one, sometimes you get stuck on the most primitive problems. Thanks for helping out! – reaction hashs Dec 25 '18 at 05:21
0

Or simply use walrus operator:

while line := f.readline():
    # ... some code
    print(line)
Jurakin
  • 832
  • 1
  • 5
  • 19