0

I haven't got any idea about python. But I do have a file, substance.txt, which is a list of about 4k substances. I have a log file, log.txt, which contains updates to these substances that at the moment, I am manually reflecting in substance.txt. The log has the format + tab at the start if it is a new concept or -tab at the start if it is a concept which should be removed from substance.txt

Using python, I am tying to go through and first, copy everything in the substance.txt which is not in the log to a new file. Then, I am trying to go through the logfile and append anything which has '+ tab' to the bottom of the new file. That will give me all the existing substance.txt content which is not affected + any new terms from log.txt and will have removed any concepts which are flagged in log.txt for removal.

This is my code:

import re
import fileinput

#  write concepts which are not not in log

with open("log.txt", 'r') as f,  open("substance.txt", "r") as oldfile,      
open('new_substance.txt', 'w') as newfile:

withconceptsremoved = [x for x in oldfile if x not in f]
newfile.write(withconceptsremoved)

#  so the new file only has comments which are neither positive or negative in log.  If we now copy positive ones, we've removed the negatives

#  write new additions to bottom of new file 
for line in f:
    if '+\t' in line:
        addedconcept = line.replace('+\t','1\t')
        newfile.write(addedconcept)  

this is my error:

line 8, in newfile.write(withconceptsremoved) TypeError: expected a character buffer object

If I remove the

withconceptsremoved = [x for x in oldfile if x not in    
newfile.write(withconceptsremoved)

it works. I looked at this TypeError: expected a character buffer object - while trying to save integer to textfile but didn't understand it

Community
  • 1
  • 1
lobe
  • 31
  • 7
  • `[x for x in oldfile if x not in f]` will make a list which you can't write to a file, use `"\r\n".join(x for x in oldfile if x not in f)` to turn it into a string with each value on a new line – Peter Jun 02 '15 at 21:44
  • @Peter just `'\n'.join(...)`. Python handles universal newlines *universally* well. – Adam Smith Jun 02 '15 at 21:46
  • Dunno, I found the first time writing files that my notepad wouldn't recognise just `\n`, so ever since then I've been using `\r\n`, since the extra \r doesn't break anything anyway :) – Peter Jun 02 '15 at 21:50

1 Answers1

1

I tried your code and found several issues that prevented it to do what you wanted.

  • The list "withconceptsremoved" needs to be converted to string as stated in the question comments
  • You are reading from "f" two times, so you would need to seek the file to the start each time (or try to read it only once)
  • The "if x not in f" does not work cause you need to use a list instead of "f"
  • You are not taking the "-\t" and "+\t" into account when doing "if x not in f"

I fixed the issues and now it seems to be working fine for me.

Here's the updated code I came up with:

import re
import fileinput

#  write concepts which are not not in log

with open("log.txt", 'r') as f,  open("substance.txt", "r") as oldfile, open('new_substance.txt', 'w') as newfile:
    # read f only once and convert it to a list
    logList = list(f)
    # remove everything that exists in f either with a +\t or a -\t
    withconceptsremoved = [x for x in oldfile if ('-\t'+x not in logList and '+\t'+x not in logList)]
    # convert to string and write to file
    withconceptsremoved = "".join(withconceptsremoved)
    newfile.write(withconceptsremoved)
    #  so the new file only has comments which are neither positive or negative in log.  If we now copy positive ones, we've removed the negatives
    #  write new additions to bottom of new file 
    for line in logList:
        if '+\t' in line:
            addedconcept = line.replace('+\t','1\t')
            newfile.write(addedconcept)  
eugenioy
  • 11,825
  • 28
  • 35