0

I'm making a script in Python for searching for the selected term (word/couple words, sentence) in a bunch of .txt files in a selected folder with printing out the names of the .txt files which contain the selected term. Currently is working pretty fine using os module:

import os

dirname = '/Users/User/Documents/test/reports'

search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]

for f in os.listdir(dirname):
    with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
        text =  infile.read()

    if all(term in text for term in search_terms):
        print (f)

The output will be something like this:

3003.txt
3002.txt
3006.txt
3008.txt

I would like to append these results as a string column in Pandas Dataframe but when I 'm trying to do so I'm receiving the error message:

lst = []

    if all(term in text for term in search_terms):
        lst.append(f)
        df = pd.DataFrame(lst)
        print (f)

How can this be done?

Keithx
  • 2,994
  • 15
  • 42
  • 71

1 Answers1

2

In the code below the new lines are indicated by '*'.

Code from question

import os
import pandas as pd # new line * * *
import numpy as np # new line * * *

dirname = '/Users/User/Documents/test/reports'

search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]

# Create empty dataframe to store file names # new line * * *
df = pd.DataFrame()  # new line * * *

for f in os.listdir(dirname):
    with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
        text =  infile.read()

    if all(term in text for term in search_terms):
        print (f)
        # Store value 'f' inside a dataframe column
        df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)

Sample code

f = ['3003.txt', '3002.txt', '3006.txt', '3008.txt']
df = pd.DataFrame({'file_names': f})
df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
df

enter image description here

Nilesh Ingle
  • 1,777
  • 11
  • 17
  • Thanks but helped to use f like a list to avoid scalar values: df = df.append(pd.DataFrame({'files':[f]}),ignore_index=True) – Keithx Jul 16 '18 at 00:10