Transforming function for reading txt files into one string to document logics

Question

Have a bunch of .txt files in the folder. Here are two functions which are using for reading these files and saving them into a variable as one string:

s=(glob.glob("/Users/user/documents/folder/*.txt"))

def read_files(files):
    for filename in files:
        with open(filename, 'r', encoding='latin-1') as file:
            yield file.read()

def read_files_as_string(files, separator='\n'):
    files_content = list(read_files(files=files))
    return separator.join(files_content)

results=read_files_as_string(s)

Now my idea to use sklearn's CountVectorizer() for getting n-grams from the text. But CountVectorizer() does not receive as input the string. So my question would be- how can I make the function for reading the files not to storing them into one string but store them using that logic: ['text1.txt', 'text2.txt', ..., 'textn.txt']

Thanks in advance!

Have I understood correctly that you want the result to be like `["contents of text1.txt", "contents of text2.txt", …]`, not the filenames as your question shows? — Aankhen, Jul 09 '18 at 11:13
fully correct. not the names but the contexts like you mentioned: ["contents of text1.txt", "contents of text2.txt", …] — Keithx, Jul 09 '18 at 13:51

score 1 · Accepted Answer · answered Jul 09 '18 at 13:54

1

read_files already does almost all of what you want. You can call it directly and use list to convert it from a generator into a regular list:

results = list(read_files(s))

answered Jul 09 '18 at 13:54

Aankhen

2,198
11
19

Transforming function for reading txt files into one string to document logics

1 Answers1