Python 3.6 Recursively Write to Multiple Text Files

Question

My existing, and working code is:

#!/usr/bin/env python3
import glob
import xml.etree.ElementTree as ET
filenames = glob.glob("C:\\Users\\####\\Desktop\\BNC\\[000-ZZZ]*.xml")
out_lines = []
for filename in filenames:
    with open(filename, 'r', encoding="utf-8") as content:
        tree = ET.parse(content)
        root = tree.getroot()
        for w in root.iter('w'):
            if w.text is None:
                w.text = "####"            
            lemma = w.get('hw')
            if lemma is None:
                lemma = "####"
            pos = w.get('pos')
            if pos is None:
                pos = "####"
            tag = w.get('c5')
            if tag is None:
                tag = "####"
            out_lines.append(w.text + "\t" + lemma + "\t" + pos + "\t" + tag)

with open("C:\\Users\\####\\Desktop\\bnc.txt", "w", encoding="utf-8") as out_file:
    for line in out_lines:
        out_file.write("{}\n".format(line))

There are 4,049 xml source files, and this yields an output over 2GB, with many more lines than I can easily import to other packages for manipulation.

I processed them in batches of 100 files manually, but this still resulted in some output files with more than 1,048,576 lines

I would like to have the print loop recursively output files based on a set filename, after each 1,048,576 lines (or less, being able to specify would be ideal).

e.g.

bnc-001.txt 1,048,576 lines
bnc-002.txt 1,048,576 lines
...
bnc-050.txt 56,789 lines (plucked from the air)

No idea how to start this one.

That’s not recursion. You can just count output lines as you write, and close/open files. — Davis Herring, Apr 22 '18 at 18:08
is there a better tag i could use to help find some help on this? — pglove, Apr 26 '18 at 15:16
It appears you want [tag:filesplitting], but I don’t know what will help more than what I’ve already written. — Davis Herring, Apr 27 '18 at 00:31

Python 3.6 Recursively Write to Multiple Text Files

0 Answers0