For example I would like to do some NLP text treatment : extract some keywords, and find correlation between them (with previous lemma-POS segmentation). The Pipeline would be :
count all (lemmatised) words,
make a stopwords list,
use a RAKE-like algorithm to extract keyword list,
make some frequency-correlation matrix the kw list content and/or the POS/lemma words...
For example in pseudo-python :
def count_words(infile,open_and_read) :
dic = {}
f = open_and_read(infile)
for word in f:
if word not in dic:
dic[word] = 1
else dic[word] +=1
return dic
etc etc
How do you transform this kind of pipeline in continuous programming ?