0

I am trying to append jpg file paths to a pre-existing text list if the path is not already present. I am also trying to treat the list as bytes-like to be a little more efficient.

I was able to use the following bit of code to create and append a list based on search results:

#----> Variables

jpg_index = "/path/to/list_of_jpgs"

search_locations = ["path/to/folder/a", "path/to/folder/b"]

#---> Code 

#does my path exist
if os.path.exists(jpg_index):
    #for-loop to check all given search locations
    for search_location in search_locations:
        #find paths ending with jpg
        for path in Path(search_location).rglob('*.jpg'):
            #Open the index file / jpg list and append paths
            with open(jpg_index, 'a') as filehandle:
                filehandle.writelines('%s\n' % path)

However, when I try to check a pre-existing text to see whether new paths should be added, nothing seems to happen. I am trying the variations of the following:

#----> Variables

jpg_index = "/path/to/list_of_jpgs"

search_locations = ["path/to/folder/a", "path/to/folder/b"]

#---> Code 

#does my path exist
if os.path.exists(jpg_index):
    #for-loop to check all given search locations
    for search_location in search_locations:
        #find paths ending with jpg
        for path in Path(search_location).rglob('*.jpg'):
            #Open the jpg_index so it can be scanned for matches
            with open(jpg_index, 'rb', 0) as file, \
                mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
                    #proceed if no matches are found
                    if s.find(b'path') != -1:
                    #Open the index file / jpg list and append paths
                        with open(jpg_index, 'a') as filehandle:
                            filehandle.writelines('%s\n' % path)

I have tried various other solutions, such as shuffling where I start the check using mmap, trying to avoid using with open twice and also I am certain it is checking for the string 'path' not actually looking at my paths.

Frequently however I get no error messages so it's hard to move forward. Evidently it's working, but not as I want it to.

#---> Modules (c&p from top of code)

import os
import pathlib
import glob, os
from pathlib import Path   
import os.path
from os import path
import mmap

EDIT:

I have also tried to implement the answer given by @skywallkee and changed the code to:

if os.path.exists(jpg_index):
    for search_location in search_locations:
        for path in Path(search_location).rglob('*.jpg'):
            mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
                if s.find(str.encode(path.__str__())) < 0:
                    with open(jpg_index, 'a') as filehandle:
                        filehandle.writelines('%s\n' % path)

This, however, gives the error

    mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
                                                         ^
SyntaxError: invalid syntax

Where have I gone wrong here? (I am on Mac, using Atom with the Hydrogen module.)

Edit 2:

I deleted too much, skywallkee's explanation was great, see his Pastebin for his solution without my mistakes.

Please note that using Atom, this seems to fail as the list does not expand if it is open in a tab. The text list must be closed and reopened to observe progress.

Solebay Sharp
  • 519
  • 7
  • 24

1 Answers1

1

That's because your code will always check if "path" represented as bytes is in the file, which might not be at all. If you want to check if the content of the variable path is in file, then you can do it like this:

if s.find(str.encode(path.__str__())) < 0:
#Open the index file / jpg list and append paths
      with open(jpg_index, 'a') as filehandle:
            filehandle.writelines('%s\n' % path)

The s.find will give a positive value if the file has been found, therefore you want to check if it is smaller than 0 so you can write to file. If you're checking if it is != -1 then you'll most likely never write to file as the s.find will return -1 for not finding the file and you're looking if s.find != -1, which is not different, so you'll never go into that if. You'll only go into it if you already have some paths in the file and the paths are already there, reason why you'd actually write the paths twice and never write paths that aren't already in the file.

By using str.encode("string") you're converting the string into bytes, so you can actually convert the path into bytes by doing str.encode(path.__str__()). You also need to call path.str() as path is a WindowsPath (if you're running on Windows) so you want the path as a string.

skywallkee
  • 41
  • 4
  • I have tried to implement your suggested change, and have run into a slightly different error. Would you be able to check my edited question and see where I am going wrong? I appreciate your assistance. – Solebay Sharp Aug 13 '20 at 12:43
  • 1
    You deleted too many lines forgetting an ```with``` at the beginning of the line. Also, fixing that might also give you an error with "file used before declaration" as you also deleted the part with ```with open(jpg_index, 'rb', 0) as file, \```. Check out line 4 of the new code with line 16 of the first code. – skywallkee Aug 13 '20 at 12:49
  • I see where I have deleted the line. Currently it's not working but I will troubleshoot for a bit so you don't have to correct similar basic errors. – Solebay Sharp Aug 13 '20 at 13:00
  • 1
    In case you can't figure it out, I've written and checked it here: https://pastebin.com/NQBCfzAk it's working as intended (haven't bothered removing the imports that might not be needed, I'll leave that to you). Also, an important note is that the fiel you're writing to **must** not be empty, meaning that you'd have to put a trailing space or something inside so that it will not be empty at first launch otherwise mmap might cry because of it. After running it once and having some strings inside, you can simply remove that beginning space, it's only needed in the first run. – skywallkee Aug 13 '20 at 13:06
  • Thank you, this wasn't working in Atom but I just ran it using IDLE and it worked perfectly! – Solebay Sharp Aug 13 '20 at 13:21