I have several txt files that I have successfully converted into csv files and I now want to clean them all in the same manner, but my script is having issues reading the file names.
First I converted all txt files in my folder of interest into csv files:
files_dir = r'/Desktop/raw_data'
files = os.listdir(files_dir)
for file in files:
if fnmatch.fnmatch(file, 'deseq2*'):
extension = os.path.splitext(file)[1]
if extension =='.txt':
filename = os.path.join(files_dir, file)
df = pd.read_csv(filename, sep='|')
new_filename = os.path.splitext(filename)[0] + '.csv'
df.to_csv(new_filename, index=False)
I want to apply the following 'clean up' to all the csv files that were created and then save. This is taking a list of strings (genes) and only pulling out the data for those genes from the gene_name column.
cleaned = df[df['gene_name'].isin(genes)]
This is what I have attempted in order to do this to all of the files in my folder:
path = r'/Desktop/raw_data'
all_files = glob.glob(os.path.join(path, "*.csv")) #make list of paths
for file in all_files:
# Getting the file name without extension
file_name = os.path.splitext(os.path.basename(file))[0]
df = pd.read_csv(file_name)
cleaned = df[df['gene_name'].isin(genes)]
df.to_csv(file_name)
I think that I have identified that the issue is occuring at the following line of code:
df = pd.read_csv(file_name)
I get the following error: [Errno 2] No such file or directory: 'example_file'
I thought that maybe I needed to have .csv in the file name so I tried the following but I also got an error.
df = pd.read_csv(file_name +'.csv')
[Errno 2] No such file or directory: 'example_file.csv'
I am confused as to what is going on because the file definitely exist in the folder that I am referencing. Any help is appreciated.
Code for applying data cleaning to all csv files taken from here.