0

I have a .txt file like this:

2019-03-29 12:03:07 line1 
                    line2
                    line3
                    ....
2019-03-30 07:05:09 line1
                    line2
                    ....
2019-03-31 10:03:20 line1
                    line2
                    ....

I split the file into several files, like this:

inputData = 'dirname\..'
numThrd  = 3
def chunkFiles():
    nline = sum(1 for line in open(inputData,'r', encoding='utf-8', errors='ignore'))
    chunk_size = math.floor(nline/int(numThrd))
    n_thread = int(numThrd)
    j = 0
    with open(inputData,'r', encoding='utf-8', errors='ignore') as fileout:
        for i, line in enumerate(fileout):
            if (i + 1 == j * chunk_size and j != n_thread) or i == nline:
                out.close()
            if i + 1 == 1 or (j != n_thread and i + 1 == j * chunk_size):
                chunkFile = 'rawData' + str(j+1) + '.txt'
                if os.path.isfile(chunkFile ):
                    break
                out = open(chunkFile , 'w+', encoding='utf-8', errors='ignore')
                j = j + 1
                fLine = line[:-1]
                if not matchLine:
            if out.closed != True:
                out.write(line)
            if i % 1000 == 0 and i != 0:
                print ('Processing line %i ...' % (i))

However, I want the split file to meet the condition that the last line in the chunk file must be right before the line that has the date.

recent output that I got:

rawData1.txt
2019-03-29 12:03:07 line1
                    line2
                    ....
-------------------------
rawData2.txt
                    line50
                    line51
2019-03-30 07:05:09 line1
                    line2
                    .....

Desired output:

rawData1.txt
2019-03-29 12:03:07 line1 
                    line2
                    line3
                    ....
-------------------------
rawData2.txt
2019-03-30 07:05:09 line1
                    line2
                    ....

what should I add to the script above to meet that conditions?

Thank you very much

petezurich
  • 9,280
  • 9
  • 43
  • 57
elisa
  • 489
  • 5
  • 13
  • the code seems to be truncated (stops right after an `if` with no content) – Adam.Er8 Jun 23 '19 at 13:14
  • 1
    @Adam.Er8 you're right. I've updated – elisa Jun 23 '19 at 13:18
  • 2
    Make a list to hold lines;While iterating; check if each line starts with a space; if it does not, write all previously collected lines to a file; start a new list with the current line; if a line startswith a space; add it to the list. – wwii Jun 23 '19 at 13:24

1 Answers1

1

You can produce the desired output by using a list to hold the lines you want to write (see below).

def write_chunk(filename, chunk):
    with open(filename, "w") as out:
        for i in chunk:
            out.write(i)

chunk = []
n_chunk = 1

with open("data.txt") as f:
    for line in f:
        if not line[0].isspace() and chunk:
            write_chunk("{}.txt".format(n_chunk), chunk)
            chunk = []
            n_chunk += 1
        chunk.append(line)
# write final chunk
write_chunk("{}.txt".format(n_chunk), chunk)
Wytamma Wirth
  • 543
  • 3
  • 12
  • why I got one file for one line? – elisa Jun 23 '19 at 13:55
  • The code will only write a new file if the line doesn't start with a space. Do the lines in the files start with spaces? – Wytamma Wirth Jun 23 '19 at 14:01
  • all of lines, except line that start with datetime – elisa Jun 23 '19 at 14:08
  • Did you try the code above? It works for me. The first file it produces (1.txt) cannot be a single line, so I'm not sure how that's happening for you. Maybe check the whitespace in your txt file is actually a space and not a tab? I changed the code to check for any whitespace at the start of the line. – Wytamma Wirth Jun 23 '19 at 14:13