1

Hello i am running into this small issue where in i am reading some content from one file, extracting some columns and writing it into another file. Now since write() method does not add newline character after each line of text i did that via below code but that poses one problem that it would add an extra blank line at the end of the file as well and that is not intended.

fh.write(string+'\n')

So i would like to know how can we solve this, below is my code

with open("C:\\test.txt") as fh, open("C:\\newtest","w") as f:
    for line in fh:
        if not re.search("^$",line):
            f.write(line.split()[-1].split(",")[0]+'\n')

So any suggestions.

AChampion
  • 29,683
  • 4
  • 59
  • 75
Rohit
  • 3,659
  • 3
  • 35
  • 57
  • Keep track of the current line number and the total line numbers and then compare them to each other. If the current line == total lines don't add the \n t – kyle Nov 09 '16 at 18:48
  • @kyle But that's again more code, another check to be made in the code i guess – Rohit Nov 09 '16 at 18:53
  • If the dataset is small enough then you can `f.write('\n'.join(line.split()[-1].split(",")[0] for line in fh if not re.search("^$",line))` instead of the for loop. – AChampion Nov 09 '16 at 18:54
  • @AChampion this does not seem to work as it joins every string with newline and hence output goes like this. `d n = " u i d = e e c d l b a c c pd n = " u i d = e e c d l b a c c pd` – Rohit Nov 09 '16 at 19:03
  • write first line as normal, then subsequent lines with newline at the beginning – Aaron Nov 09 '16 at 20:16
  • @Rohit adding additional code isn't inherently bad... especially when some of the other suggestions say to search the dataset which would drastically increase the big O – kyle Nov 09 '16 at 22:51
  • Lots of ways to do this (as shown below) but did you consider just trimming the last newline off when you're done processing all the text? – snowballhg Nov 10 '16 at 06:01
  • @snowballhg yeah i did think of that but could not figure out how to do that, can you please guide !! – Rohit Nov 10 '16 at 06:12
  • http://stackoverflow.com/a/18857381/106468 – snowballhg Nov 10 '16 at 06:18

3 Answers3

1

If you know your file will always have at least one line, you could simply write the first line with no changes then write all subsequent lines with \r\n appended to the beginning of the string:

with open("C:\\test.txt") as fh, open("C:\\newtest","w") as f:
    for line in fh:
        if not re.search("^$",line):
            f.write(line.split()[-1].split(",")[0]) #first line with no newline
            break #on first occurance
    for line in fh:
        if not re.search("^$",line):
            f.write('\n'+line.split()[-1].split(",")[0]) #rest of the lines with prepended newline

edit:

why doesn't the file start back on the first line with the second loop?

behind the scenes, the object fh has some internal state to keep track of effectively a "cursor" within the file, and a special method called fh.next() this method is used to yield the next value (in this case each line as separated by '\n'. when the end of the file is reached a special type of exception is raised called StopIteration this is a special exception type recognized by the for loop that tells it to exit the loop. If the loop is exited beforehand using break, the internal cursor in the file stays in place, and further iteration picks up where you left off.

you can play around with learning how iteration works behind the scenes by creating your own custom generator and looping over it with a for loop:

def generator_constructor():
    x = 10
    while x > 0:
        yield x
        x = x - 1 #decrement x

generator = generator_constructor()

print generator.next() #prints 10
print generator.next() #prints 9

print "\nlooping\n" #indicate where we enter the loop

while True: #infinite loop we will need to break out of somehow
    try:
        print generator.next() #print next value
    except StopIteration: #if we reach the end (exit wile loop of generator constructor)
        break #then break out of the loop

Try taking this code and making it do something more interesting so you can understand what's going on behind the scenes

Aaron
  • 10,133
  • 1
  • 24
  • 40
  • Very small doubt because of me being a beginner to python.. Won't the 2nd for loop again starts with 1st line, so why the 1st line of code is not printed twice ??? I am little confused. – Rohit Nov 10 '16 at 08:04
  • Thanks much.I have one more puzzle & i tried lot of things but couldn't get it to work.I have a file `test` with some data,from this file i extract certain values and write to another file `newtest`.This `newtest` file does not have any blank lines but it contains duplicates,bt i want only uniq lines.So i do like below, but it leaves a trailing`\n`& since its a set it doesn't wrk like as u suggested. `with open("C:\\newtest","r+") as f: lines=[line.rstrip() for line in f] lines=set(lines) f.seek(0) f.truncate() for line in lines: f.write(line+'\n')` Any hlp – Rohit Nov 11 '16 at 06:40
  • I solved it by modifying the ` f.write(line+'\n')` to ` f.write('\n'.join(lines).lstrip())` – Rohit Nov 11 '16 at 10:27
0
with open("C:\\test.txt") as fh, open("C:\\newtest","w") as f:
    output_lines = []
    for line in fh:
        if not re.search("^$",line):
            output_lines.append(line.split()[-1].split(",")[0])
    output = '\n'.join(output_lines)
    f.write(output)

or even

with open("C:\\test.txt") as fh, open("C:\\newtest","w") as f:
    output_lines = [ 
              line.split()[-1].split(",")[0]
              for line in fh
              if not re.search("^$",line)
         ]
    output = '\n'.join(output_lines)
    f.write(output)
Yevhen Kuzmovych
  • 10,940
  • 7
  • 28
  • 48
  • You might have lines that evaluate into some white characters by `line.split()[-1].split(",")[0]`. Join will not insert new line in the end – Yevhen Kuzmovych Nov 09 '16 at 19:20
0

I just kind of solved it, it was a misinterpretation I guess of the editor, because I checked for blank lines and I didn't find any.

with open("C:\\test.txt") as fh, open("C:\\newtest","w") as f:
    for line in fh:
        if not re.search("^$",line):
            f.write(line.split()[-1].split(",")[0]+'\n')
fh=open("C:\\ECD Utilization Script - Copy\\newtest","r")
n=0
for line in fh:
    if  re.search("^$",line):
        n=n+1
print(n,"Blank lines")
fh.close()
Rohit
  • 3,659
  • 3
  • 35
  • 57