0

Have two simple 20 line text files. Current script below only reads line 20 in both, runs main 'context_input' process without errors, then exits. Need to apply same process to all lines 1-20.

Same result if using counter with import sys. Requirement is to read strings not create a list. readlines() will cause errors. Any code snippets for setting a proper loop to accomplish this are appreciated.

# coding=utf-8

from src.model_use import TextGeneration
from src.utils import DEFAULT_DECODING_STRATEGY, MEDIUM
from src.flexible_models.flexible_GPT2 import FlexibleGPT2
from src.torch_loader import GenerationInput

from transformers import GPT2LMHeadModel, GPT2Tokenizer

def main():

    with open("data/test-P1-Multi.txt","r") as f:
         for i in range(20):
            P1 = f.readline()

    with open("data/test-P3-Multi.txt","r") as f:
         for i in range(20):
            P3 = f.readline()

    context_input = GenerationInput(P1=P1, P3=P3, size=MEDIUM)

    print("\n", "-"*100, "\n", "PREDICTION WITH CONTEXT WITHOUT SPECIAL TOKENS")
    model = GPT2LMHeadModel.from_pretrained('models/774M')
    tokenizer = GPT2Tokenizer.from_pretrained('models/774M')
    GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)

    text_generator_with_context = TextGeneration(GPT2_model, use_context=True)

    predictions = text_generator_with_context(context_input, nb_samples=1)
    for i, prediction in enumerate(predictions):
        print('prediction n°', i, ': ', prediction)

    del model, tokenizer, GPT2_model

if __name__ == "__main__":
    main()
Ron
  • 1
  • 1
  • It's only applying to the last line because even though your for loop is reading every line, they are replacing each other, so you're left with the last one. – afghanimah Apr 29 '20 at 16:39
  • Could you provide an example of how this can be fixed? New to python. Thanks – Ron Apr 29 '20 at 19:54
  • Sure; just to clarify: you want `context_input = ...` through `del model, tokenizer, GPT2_model` to be repeated for every line in the files? – afghanimah Apr 29 '20 at 19:58

1 Answers1

0

So, your fix is going to be in main by reorganizing the with and if statements:

def main():
  with open("data/test-P1-Multi.txt","r") as f1, open("data/test-P3-Multi.txt","r") as f3:
    for i in range(20):
      P1 = f1.readline()
      P3 = f3.readline()

      context_input = GenerationInput(P1=P1, P3=P3, size=MEDIUM)

      print("\n", "-"*100, "\n", "PREDICTION WITH CONTEXT WITHOUT SPECIAL TOKENS")
      model = GPT2LMHeadModel.from_pretrained('models/774M')
      tokenizer = GPT2Tokenizer.from_pretrained('models/774M')
      GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)

      text_generator_with_context = TextGeneration(GPT2_model, use_context=True)

      predictions = text_generator_with_context(context_input, nb_samples=1)
      for i, prediction in enumerate(predictions):
          print('prediction n°', i, ': ', prediction)

      del model, tokenizer, GPT2_model

Note: you might be able to pull some code out of the loop if it doesn't change between lines so you don't need to re-initialize them over and over again, but I'm not familiar with what you imported.

afghanimah
  • 705
  • 3
  • 11
  • Thanks! Moved 'model =' ,' tokenizer =' , 'GPT2_model =', and 'text_generator_with_context =' lines before 'with open ...' and now loads 774M model once and processes all 20 lines perfectly. – Ron Apr 30 '20 at 00:46
  • Nice, that was my thought too – afghanimah Apr 30 '20 at 00:53