2

I would like to do extract only noun or nouns groups from huge text file. The python code below works fine but extract the nouns for only the last line. I am pretty sure the code requires 'append' but don't know how (I am a beginner of python.)

import nltk
import pos_tag
import nltk.tokenize 
import numpy

f = open(r'infile.txt', encoding="utf8")
data = f.readlines()

tagged_list = []

for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    nouns = [word for word,pos in tagged \
            if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]
    downcased = [x.lower() for x in nouns]
    joined = " ".join(downcased).encode('utf-8')
    into_string = str(nouns)

output = open(r"outfile.csv", "wb")
output.write(joined)
output.close()

The result looks like this: apartment transport downtown, which are the noun words for the last line of the file. I'd like to save the nouns for each line of the file in one line. For example, the input file and the corresponding results should look like this.

Input file:
I like the milk.
I like the milk and bread.
I like the milk, bread, and butter.

Output file:
milk
milk bread
milk bread butter

Hope somebody helps to fix the code above.

Emily
  • 305
  • 3
  • 13

1 Answers1

2

Add a line end of the for loop, then write it to the file.

...
result = ""
for line in data:
    ...
    result += joined

output = open(r"outfile.csv", "w")
output.write(str(result))
output.close()

If you want to use append:

...
result_list = []
for line in data:
    ...
    result_list.append(joined)

output = open(r"outfile.csv", "w")
output.write(str(result_list))
output.close()

Also, you can use this writing way, if you use the result list:

...
output = open(r"outfile.csv", "w")
for item in result_list:
    output.write(str(item) + "\n")
output.close()
Alperen
  • 3,772
  • 3
  • 27
  • 49
  • Probably better off using a list approach. Otherwise everything will be written on one line without spaces :P – Mangohero1 Sep 22 '17 at 14:55
  • @Alperen, thanks for the comments. Can you please read the next post by me? I think you have answers. – Emily Sep 22 '17 at 15:45
  • @mangoHero1, thanks for the comments, Can you please read the next post by me? You may have the answer. – Emily Sep 22 '17 at 15:46
  • @Alperen, I tried the second approach with append (tried the exactly same code) but got this error --> "Traceback (most recent call last): File "extract_nouns2.py", line 22, in output.write(result_list) TypeError: a bytes-like object is required, not 'list'" Can you find what is the wrong with this?" – Emily Sep 22 '17 at 17:03
  • @Emily I guess, you need to use `output = open(r"outfile.csv", "w")` . Details are [here](https://stackoverflow.com/a/34283957/6900838). I can't find your next post. You can put a link here. – Alperen Sep 22 '17 at 17:47
  • @Emily, substitute the last three lines with the code at the bottom of his solution. You'll want to write each `item` in the result list, not `result_list` itself – Mangohero1 Sep 22 '17 at 17:52
  • @Alperen, I still have following error after changing the output as you suggested (from wb to w). --> "Traceback (most recent call last): File "extract_nouns2.py", line 22, in output.write(result_list) TypeError: write() argument must be str, not list" It looks that the error happens in output.write line. Do you have any more idea? – Emily Sep 22 '17 at 18:25
  • @mangoHero1, I tried what you suggested (replacing the last three lines) and got the following error --> "Traceback (most recent call last): File "extract_nouns3.py", line 23, in output.write(item + "\n") TypeError: can't concat str to bytes" It really looks that the output.write line drives error. Do you have more thoughts? – Emily Sep 22 '17 at 18:30
  • @Emily, A little difficult to synthesize our thoughts over the internet. :-) Use the bottom code *and* change `"wb"` to `"w"` – Mangohero1 Sep 22 '17 at 18:31
  • @mangoHero1, I actually used "w" instead of "wb" and got the error above. – Emily Sep 22 '17 at 18:34
  • I can't replicate that error. Maybe check if `item` is a string type. – Mangohero1 Sep 22 '17 at 18:46
  • @Emily I can't install to my computer because of my Python version, I guess. So, I can't try the code. If you have an error about str, you can use str() function. Here is an example: ` output.write(str(result_list))`. I'll edit my post. Please try and let me know if it works or not. – Alperen Sep 22 '17 at 19:50