trying to find a way of making this process work pythonically or at all. Basically, I have a really long text file that is split into lines. Every x number of lines there is one that is mainly uppercase, which should roughly be the title of that particular section. Ideally, I'd want the title and everything after to go into a text file using the title as the name for the file. This would have to happen 3039 in this case as that is as many titles will be there. My process so far is this: I created a variable that reads through a text file tells me if it's mostly uppercase.
def mostly_uppercase(text):
threshold = 0.7
isupper_bools = [character.isupper() for character in text]
isupper_ints = [int(val) for val in isupper_bools]
try:
upper_percentage = np.mean(isupper_ints)
except:
return False
if upper_percentage >= threshold:
return True
else:
return False
Afterwards, I made a counter so that I could create an index and then I combined it:
counter = 0
headline_indices = []
for line in page_text:
if mostly_uppercase(line):
print(line)
headline_indices.append(counter)
counter+=1
headlines_with_articles = []
headline_indices_expanded = [0] + headline_indices + [len(page_text)-1]
for first, second in list(zip(headline_indices_expanded, headline_indices_expanded[1:])):
article_text = (page_text[first:second])
headlines_with_articles.append(article_text)
All of that seems to be working fine as far as I can tell. But when I try to print the pieces that I want to files, all I manage to do is print the entire text into all of the txt files.
for i in range(100):
out_pathname = '/sharedfolder/temp_directory/' + 'new_file_' + str(i) + '.txt'
with open(out_pathname, 'w') as fo:
fo.write(articles_filtered[2])
Edit: This got me halfway there. Now, I just need a way of naming each file with the first line.
for i,text in enumerate(articles_filtered):
open('/sharedfolder/temp_directory' + str(i + 1) + '.txt', 'w').write(str(text))