How to read multiple text files as strings from two folders at the same time using readline() in python?

Question

Currently have version of the following script that uses two simple readline() snippets to read a single line .txt file from two different folders. Running under ubuntu 18.04 and python 3.67 Not using glob.

Encountering 'NameError' now when trying to read multiple text files from same folders using 'sorted.glob'

readlines() causes error because input from .txt files must be strings not lists.

New to python. Have tried online python formatting, reindent.py etc. but no success.

Hoping it's a simple indentation issue so it won't be an issue in future scripts.

Current error from code below:

Traceback (most recent call last):
  File "v1-ReadFiles.py", line 21, in <module>
    context_input = GenerationInput(P1=P1, P3=P3,
NameError: name 'P1' is not defined

Current modified script:

import glob
import os

from src.model_use import TextGeneration
from src.utils import DEFAULT_DECODING_STRATEGY, LARGE
from src.flexible_models.flexible_GPT2 import FlexibleGPT2
from src.torch_loader import GenerationInput

from transformers import GPT2LMHeadModel, GPT2Tokenizer

for name in sorted(glob.glob('P1_files/*.txt')):
    with open(name) as f:
        P1 = f.readline()

for name in sorted(glob.glob('P3_files/*.txt')):
    with open(name) as f:
        P3 = f.readline()

if __name__ == "__main__":

    context_input = GenerationInput(P1=P1, P3=P3,
                                    genre=["mystery"],
                                    persons=["Steve"],
                                    size=LARGE,
                                    summary="detective")

    print("PREDICTION WITH CONTEXT WITH SPECIAL TOKENS")
    model = GPT2LMHeadModel.from_pretrained('models/custom')
    tokenizer = GPT2Tokenizer.from_pretrained('models/custom')
    tokenizer.add_special_tokens(
        {'eos_token': '[EOS]',
         'pad_token': '[PAD]',
         'additional_special_tokens': ['[P1]', '[P2]', '[P3]', '[S]', '[M]', '[L]', '[T]', '[Sum]', '[Ent]']}
    )
    model.resize_token_embeddings(len(tokenizer))
    GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)

    text_generator_with_context = TextGeneration(GPT2_model, use_context=True)

    predictions = text_generator_with_context(context_input, nb_samples=1)
    for i, prediction in enumerate(predictions):
        print('prediction n°', i, ': ', prediction)

score 0 · Answer 1 · answered Apr 30 '20 at 02:05

Thanks to afghanimah here:

Problem with range() function when used with readline() or counter - reads and processes only last line in files

Dropped glob. Also moved all model= etc. load functions before 'with open ...'

with open("data/test-P1-Multi.txt","r") as f1, open("data/test-P3-Multi.txt","r") as f3:     
  for i in range(5):
    P1 = f1.readline()
    P3 = f3.readline()

    context_input = GenerationInput(P1=P1, P3=P3, size=LARGE)
    etc.

How to read multiple text files as strings from two folders at the same time using readline() in python?

1 Answers1