0

In my situation, I have a main processing Python script that creates a class (FileIterator) which will iterate through a large data file line by line.

class FileIterator:

    def read_data(self, input_data):
        with open(input_data, 'r') as input:
            for line in input:
                <perform operation>

What I am trying to do is to replace "perform operation" with a return command (or substitute) to return the line back to the main script so that I can do operations on the line outside of the FileIterator.

main_process.py

import FileIterator
import Operations

def perform_operations():
    iterator = FileIterator()
    operator = Operations()
    line = iterator.read_data('largedata.txt')
    operator.do_something(line)

Is there a suitable replacement for read_data() that will still allow me to read line by line without storing the whole entire file into memory AND be able to either save the line value into the object attribute self.line or return it to the calling script?

Please let me know if more details about the design is necessary to reach a solution.

EDIT: What I'm looking for is to limit FileIterator's responsibility to reading large files. The script that manages FileIterator should be responsible for taking each line and feeding these lines to the class Operations (for simplicity since I will have multiple classes that will need to act on this line).

Think of this design as an assembly line structure where the FileIterator's job is to chop up the file. There are other workers that will take the results from FileIterator and perform other tasks to it.

EDIT 2: Changing title because I feel it was misleading and people are upvoting the answer that was basically just a copy paste of my question.

Kevin Y
  • 45
  • 1
  • 8
  • possible duplicate of [Read large text files in Python, line by line without loading it in to memory](http://stackoverflow.com/questions/6475328/read-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory) – f.rodrigues Dec 09 '14 at 19:33
  • I'm confused. Do you want `read_data` to process the line and return it? If you only want `read_data` to return the line, the open file object is an iterator already and you can use it directly. – tdelaney Dec 09 '14 at 19:55
  • I don't want the iterator to do any operations. I want to be able to return the line or save the line as an attribute of the class so that the calling script "main_process.py" can use it. The file iterator should still be able to keep the position of the file during this process. – Kevin Y Dec 09 '14 at 20:44
  • @f.rodrigues Definitely NOT a duplicate. The answer to that thread is what I already have. I want to do something completely different. – Kevin Y Dec 09 '14 at 20:51

1 Answers1

4

file already supports line-wise iteration.

with open('largedata.txt', 'r') as fp:
  for line in fp:
    operator.do_something(line)
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Not what I'm looking for. The FileIterator does not have access to operator. Only the calling script does. I don't want to couple these classes. – Kevin Y Dec 09 '14 at 20:40