-2

I am fairly new to the Python environment and gradually working my way forward.

We got about 10,000 files in a folder containing similar information, but with one major difference. Some files contain a string 'string1' and the other set contains 'string2'. Just to clarify the string is not in the filename but in the file itself. The file content is character-delimited.

I tried to create two separate lists with string1 respectively string2 and got various lines of code but getting nowhere. Both list should only contain the filename.

Yannis P.
  • 2,745
  • 1
  • 24
  • 39
Max
  • 185
  • 1
  • 11
  • To be precise, the output shall be two lists. One with the filenames of the files containing string1 and another list with the filenames containing string2. Sorry for being cryptic – Max Apr 27 '20 at 19:22
  • please tell us which file extensions are you searching in. My answer only looks at txts but it is rather straightforward to adapt to other extensions – Yannis P. Apr 27 '20 at 19:35
  • The files don't have an extension. They come in the EDIFACT standard. – Max Apr 27 '20 at 20:43
  • Then try perhaps something like ‘*’ instead of ‘*.txt’ in my answer – Yannis P. Apr 27 '20 at 21:31

2 Answers2

2

I often use grep for those kind of things. In this case I would use

Edited to add file extensions:

grep -l string1 *.txt > string1_files.txt && grep -l string2 *.txt> string2_files.txt 

This oneliner would search string1 in txt files in the current dir, writing output to string1_files.txt and similarly for string2

copying from man grep

 -l, --files-with-matches
         Only the names of files containing selected lines are written to
         standard output.  grep will only search a file until a match has
         been found, making searches potentially less expensive.  Path-
         names are listed once per file searched.  If the standard input
         is searched, the string ``(standard input)'' is written.

Hope this helps a bit but you might want to grep only certain file extensions

Edit for no file extensions: (in case they are not available as in the question comments

grep -l string1 * > string1_files.txt && grep -l string2 *> string2_files.txt 
Yannis P.
  • 2,745
  • 1
  • 24
  • 39
-1

Assuming your file just have the string that you want to compare, you just need to do

folder = 'foo'
files = glob.glob(os.path.join(folder, "*"))

list1 = []
list2 = []
for file in files:
  with open(file, 'r') as f:
    if(f.readlines().strip() == 'string1'):
      list1.append(file)
    else
      list2.append(file)

If your files have more data, you just need to process f.readlines() and compare properly.

Augusto Maillo
  • 166
  • 1
  • 6
  • You're right. But it's hard to say what is in his files. It's a generic solution – Augusto Maillo Apr 27 '20 at 19:21
  • 1
    Apologies, I might be cryptic. The files contain thousands of lines of production order details. The string I am looking for is in the header of the file. The header is not explicitly mentioned though. – Max Apr 27 '20 at 19:23
  • So just change f.readlines().strip() to f.readlines()[0].strip() – Augusto Maillo Apr 28 '20 at 14:41