I'm trying to work with the sets and the intersect method to find which elements in a unicode list of file paths have specific characters in them. The goal is to replace these characters with other characters, so I've made a dictionary of keys and values, where the key is what will be replaced and the values is what it will be replaced with. When I try to generate an intersection set of the paths with the characters to be replaced, however, it results in an empty set. What am I doing wrong? I have this working with for loops, but I'd like to make this as efficient as possible. Feedback is appreciated!
Code:
# -*- coding: utf-8 -*-
import os
def GetFilepaths(directory):
"""
This function will generate all file names a directory tree using os.walk.
It returns a list of file paths.
"""
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
# dictionary of umlauts (key) and their replacements (value)
umlautDictionary = {u'Ä': 'Ae',
u'Ö': 'Oe',
u'Ü': 'Ue',
u'ä': 'ae',
u'ö': 'oe',
u'ü': 'ue'
}
# get file paths in root directory and subfolders
filePathsList = GetFilepaths(u'C:\\Scripts\\Replace Characters\\Umlauts')
print set(filePathsList).intersection(umlautDictionary)