Often one needs to process a sequence of "chunks", which are read from a stream of "atoms", where each chunk consisting of a variable number of atoms, and there is no way for the program to know that it has received a complete chunk until it reads the first atom of the next chunk (or the stream of atoms becomes exhausted).
A straightforward algorithm for doing this task would look like this:
LOOP FOREVER:
SET x TO NEXT_ATOM
IF DONE(x) OR START_OF_CHUNK(x):
IF NOT EMPTY(accum):
PROCESS(accum)
END
if DONE(x):
BREAK
END
RESET(accum)
END
ADD x TO accum
END
So, my question is this:
Is there a name for this general class of problems and/or for the programming pattern shown above?
The remainder of this post are just a couple of (reasonably realistic) examples of what's described abstractly above. (The examples are in Python, although they could be translated easily to any imperative language.)
The first one is a function to produce a run-length encoding of an input string. In this case, the "atoms" are individual characters, and the "chunks" are maximal runs of the same character. Therefore, the program does not know that it has reached the end of a run until it reads the first character in the following run.
def rle(s):
'''Compute the run-length encoding of s.'''
n = len(s)
ret = []
accum = 0
v = object() # unique sentinel; ensures first test against x succeeds
i = 0
while True:
x = s[i] if i < n else None
i += 1
if x is None or x != v:
if accum > 0:
ret.append((accum, v))
if x is None:
break
accum = 0
v = x
accum += 1
return ret
The second example is a function that takes as argument a read handle to a FASTA-formatted file, and parses its contents. In this case, the atoms are lines of text. Each chunk consists of a specially-marked first line, called the "defline" (and distinguished by a '>' as its first character), followed by a variable number of lines containing stretches of nucleotide or protein sequence. Again, the code can detect the end of a chunk unambiguously only after reading the first atom (i.e. the defline) of the next chunk.
def read_fasta(fh):
'''Read the contents of a FASTA-formatted file.'''
ret = []
accum = []
while True:
x = fh.readline()
if x == '' or x.startswith('>'):
if accum:
ret.append((accum[0], ''.join(accum[1:])))
if x == '':
break
accum = []
accum.append(x.strip())
return ret